DECtalk Software for Digital UNIX
Programmer's Guide

March 1996

This manual provides information on installation, overview, getting started
and step-by-step procedures for the DECtalk Software Runtime kit for the
Digital UNIX product.

Revision/Update Information: This is a revised manual

Operating System: Digital UNIX 3.0, later

Software Product Version: 4.2A

Digital Equipment Corporation
Maynard, Massachusetts

---------------------------------------------------------------------------

 Preface: About this Guide

This guide contains instructions for the installation of the DECtalk
Software product. It also contains the tutorial and reference information
you need to build a DECtalk Software application.
---------------------------------------------------------------------------

 What's the Difference Between the DECtalk Software Runtime
Kit and the DECtalk Software Development Kit?

DECtalk Software is packaged as a Runtime kit and a Development kit. The
Runtime kit gives you access to the following DECtalk Software
applications: mailtalk, say, speak, emacspeak, and DECface. In order to
develop your own DECtalk Software applications, you must order the DECtalk
Software Developer's kit. DECtalk Software Developer's kit gives you access
to the DECtalk Software API and some sample C programs.p>
---------------------------------------------------------------------------

License Requirements

You can run one copy of any DECtalk Software application at a time without
needing an LMF license. A license is required to run more than one copy of
the Runtime kit or to use the DECtalk Software Development kit. See the
section on LMF Licensing in Chapter 1 for more information.
---------------------------------------------------------------------------

 Features in DECtalk Software 4.2A

The following is a list of important features in DECtalk V4.2A:

   * Expanded main dictionary

   * Added user-dictionary suffix processing to help locate words in user
     dictionary

   * Expanded pronunciation rules and improved pronunciation

   * Homograph processing

   * Improved inline index-mark processing

   * Added the following inline commands:

   * Play command to play audio files in line with text

   * Tone command to generate tones

   * Dial command to generate DTMF tones used to dial telephone numbers

   * Stereo volume control commands

   * A new version of the mailtalk program that is fully integrated with
     mail

   * An enhanced Motif windows-based user dictionary builder that
     automatically translates text strings into phonemes

   * An improved command-line program, say, used to run DECtalk from the
     Digital UNIX command line

     Improved computational efficiency

   * Two new sample applications

        o DECface

        o Emacspeak

        o Support for CDE desktop environment

---------------------------------------------------------------------------

 Purpose and Audience

This guide is for the application programmer who wants to design and build
text-to-speech applications with DECtalk Software. This guide contains
instructions for installing DECtalk Software development subset. The
installation procedure installs all files in subdirectories under the
following directory with links to the system directory hierarchy:
/usr/opt/DTKDEV420

---------------------------------------------------------------------------


Structure

This guide is designed to provide you with quick and easy access to all
information. You can easily learn about new topics and perform specific
tasks related to running DECtalk Software application programs for the
Digital UNIX operating system.

This guide's organization is listed below:

---------------------------------------------------------------------------

Chapter                 Description
---------------------------------------------------------------------------

Chapter 1               Installing DECtalk Software
Chapter 2               Introduction to DECtalk
                        Software API
Chapter 3               Using DECtalk Software Sample Programs
Chapter 4               Creating a Customized DECtalk Software Voice
Chapter 5               DECtalk Software API Functions

---------------------------------------------------------------------------

 On-line Help

DECtalk Software on-line help is accessible in two forms:

*  Manpages --Invoke manpage help from the UNIX command line with the %man
speak command

*  HTML Hypertext -- Start Netscape hypertext help by launching Netscape
and loading the DtkDevGuide.html file.

---------------------------------------------------------------------------

 Conventions

This guide uses the following conventions:

Convention      Explanation

        enter  Enter means type the required information
               and press the Return key.
        mouse  Mouse refers to any pointing device, such
               as a mouse, a puck, or a stylus.
          MB1  MB1 indicates the left mouse button
     click on  Click on means to press and release MB1.
 double click  Double click means to press and release
               MB1 twice in rapid succession without
               moving the mouse.
         drag  The phrase drag means to press and hold
               MB1, move the mouse, and then release MB1
               when the pointer is in the desired
               position.
       Ctrl/x  A sequence such as Ctrl / x indicates
               that you must press the Ctrl key while
               you press another key.
 Menu Command  The right arrow key indicates an
               abbreviated instruction for choosing a
               command from a menu. For example, File
               Exit means pull down the File menu, move
               the pointer to the Exit command, and
               release MB1.
 Courier type  Courier type indicates text that you type
               and is displayed on the screen. This is
               most often used for program code examples.
   User Input  Boldface type in interactive examples
               indicates information you enter from the
               keyboard. For example:
         % ls speak
       " xxx"  Indicates a word, words, or phrases you
               must speak.


Unless otherwise noted, press Return after entering commands or responses
to command prompts.


---------------------------------------------------------------------------

Chapter 1: Installing DECtalk Software



This chapter covers the preinstallation, installation and post installation
tasks required to install DECtalk Software on your system. Topics include:

   * Installing DECtalk Software
        o Preinstallation Tasks
             + Accessing the Release Notes
             + Registering Your Software Licenses
             + Checking the Software Distribution Kit
        o Installation Procedure Requirements
             + Hardware Requirements
             + Software Requirements
             + Checking Current Disk Space
             + Increasing Disk Space by Using Alternative Disks
             + Installation Tasks
             + Using the CD-ROM Consolidated Distribution Media
             + Using an RIS Distribution Area
             + Starting the Installation Procedure
             + Selecting Subsets
             + Stopping the Installation
        o Post-Installation Tasks
             + Running the Installation Verification Procedure
             + Deleting DECtalk Software from Your System
             + Displaying Documentation from the CD-ROM Distribution Disc
             + Correcting Problems During Product Installation
             + Reporting Problems

---------------------------------------------------------------------------
Preinstallation Tasks

This section covers the tasks you must perform before installing DECtalk
Software. Topics include:

   *  Accessing the release notes (see Accessing the Release Notes, page
     12)

   *  Checking installation procedure requirements (see Installation
     Procedure Requirements, page 18)

   *  Hardware requirements (see Hardware Requirements, page 18)

   *  Checking current disk space (see Checking Current Disk Space, page
     19)

---------------------------------------------------------------------------


Accessing the Release Notes

DECtalk Software provides release notes. The release notes contain
information about changes to DECtalk Software for Digital UNIX. Digital
strongly recommends that you read these release notes before using the
product. See the Compact Disc User's Guide shipped with your media for
instructions about how to access the release notes prior to the software
installation.

The release notes for DECtalk Software are in the following files after the
DTKDEVRELNOTES420 subset is installed:

/usr/opt/DTKDEV420/docs/ascii/release_notes_dev.txt

/usr/opt/DTKDEV420/docs/postscript/release_notes_dev.ps

Use the following command to read the release notes for DECtalk Software
after the DTKDEVRELNOT420 subset is installed:

# more /usr/opt/DTKDEV420/docs/ascii/release_notes_dev.txt

You can also print either file.
---------------------------------------------------------------------------

 Registering Your Software Licenses

DECtalk Software includes support for the License Management Facility
(LMF). You must register your license product authorization keys (License
PAKs) in the license database (LDB) in order to use DECtalk Software on a
newly licensed system. The License PAKs is shipped with the kit if you
ordered the licenses and media together; otherwise, they are shipped
separately to a location specified on your license order.

Note

You must have the root privileges to install the DECtalk Software and to
register the license PAK.

If you are installing DECtalk Software as an update on a node already
licensed for this software, you have already completed the License PAK
registration requirements.

To register a license under the Digital UNIX operating system:

Log in as root.

At the superuser prompt, edit an empty PAK template with the lmf register)
command as follows, and include all the information on your License PAK:

# lmf register

LMF displays a blank template and invokes an editor to allow you to edit
the template. LMF invokes the editor that is defined by your EDITOR
environment variable. If the environment variable is undefined, LMF invokes
the vi editor.

You must enter the license information from the PAK accurately.

When you finish entering the license data, exit from the editor. If the
license data is correct, LMF copies it into the license Database. If the
license data is incorrect, you may reenter the editor and correct mistakes.

Alternatively, you can create a command script enclosing the license
information (the license information is in the cover letter with this kit)
found between

lmf register - << ENDLMF

and

ENDLMF

Execute this script as root.

After you register your license, use the following commands to copy the
license details from the license database (LDB) to the kernel cache:

# lmf load 0 DECTALK-SW

For complete information on using the License Management Facility, see the
Guide to Software License Management and the lmf reference page.
---------------------------------------------------------------------------

 Checking the Software Distribution Kit

Use the bill of materials (BOM) to check the contents of your DECtalk
Software software distribution kit.

In addition to this guide, the software distribution kit includes the
following:

   * CD-ROM optical disc for systems with optical disc drives

   * CD-ROM booklet

If your software distribution kit is damaged or incomplete, contact your
Digital representative.

Directories and files included in the distribution kit are listed in the
following screen display:

/usr/opt/DTKDEV420/docs/ascii:
dtk420_prog_guide.txt filelist_dev.txt
dtk420_release_notes_dev.txt

/usr/opt/DTKDEV420/docs/html:
DtkDevGuideGuide.html
dt_u.html
dt_11.html
dt_22.html
dt_33.html
dt_44.html
dt_55.html
dectalkR.gif
redball.gif 
pinkball.gif
yellowball.gif

/usr/opt/DTKDEV420/docs/man/man3:
TextToSpeechAddBuffer.3 TextToSpeechPause.3
TextToSpeechCloseInMemory.3 TextToSpeechReset.3
TextToSpeechCloseLogFile.3 TextToSpeechResume.3
TextToSpeechCloseWaveOutFile.3 TextToSpeechReturnBuffer.3
TextToSpeechGetCaps.3 TextToSpeechSetLanguage.3
TextToSpeechGetLanguage.3 TextToSpeechSetRate.3
TextToSpeechGetRate.3 TextToSpeechSetSpeaker.3
TextToSpeechGetSpeaker.3 TextToSpeechShutdown.3
TextToSpeechGetStatus.3 TextToSpeechSpeak.3
TextToSpeechLoadUserDictionary.3 TextToSpeechStartup.3
TextToSpeechOpenInMemory.3 TextToSpeechSync.3
TextToSpeechOpenLogFile.3 TextToSpeechUnloadUserDictionary.3
TextToSpeechOpenWaveOutFile.3

/usr/opt/DTKDEV420/docs/postscript:
dtk420_prog_guide.ps dtk420_release_notes_dev.ps

/usr/opt/DTKDEV420/examples/dtk/dtsamples:
Imakefile aclock.c mailtalk.c xmsay.c
README.txt dtmemory.c say.c xmsay.uil

/usr/opt/DTKDEV420/include/dtk:
dtmmedefs.h dtmmiodefs.h engphon.h ttsapi.h

/usr/opt/DTKDEV420/share/man/man3:
TextToSpeechAddBuffer.3dtk TextToSpeechPause.3dtk
TextToSpeechCloseInMemory.3dtk TextToSpeechReset.3dtk
TextToSpeechCloseLogFile.3dtk TextToSpeechResume.3dtk
TextToSpeechCloseWaveOutFile.3dtk TextToSpeechReturnBuffer.3dtk
TextToSpeechGetCaps.3dtk TextToSpeechSetLanguage.3dtk
TextToSpeechGetLanguage.3dtk TextToSpeechSetRate.3dtk
TextToSpeechGetRate.3dtk TextToSpeechSetSpeaker.3dtk
TextToSpeechGetSpeaker.3dtk TextToSpeechShutdown.3dtk
TextToSpeechGetStatus.3dtk TextToSpeechSpeak.3dtk
TextToSpeechLoadUserDictionary.3dtk TextToSpeechStartup.3dtk
TextToSpeechOpenInMemory.3dtk TextToSpeechSync.3dtk
TextToSpeechOpenLogFile.3dtk TextToSpeechUnloadUserDictionary.3dtk
TextToSpeechOpenWaveOutFile.3dtk

---------------------------------------------------------------------------
 Installation Procedure Requirements

This section discusses the requirements for installing DECtalk Software.

Installing DECtalk Software takes approximately 5 minutes, depending on
your type of media and system configuration.
---------------------------------------------------------------------------

 Hardware Requirements

To install DECtalk Software, you need the following:

   *  distribution device (if installing from media)

     Locate the drive for the CD-ROM software distribution media. The CD
     booklet or the documentation for the CD-ROM drive you are using
     explains how to load the CD-ROM media.

   * Terminal

     You can use either a hardcopy or video terminal to communicate with
     the operating system and respond to the prompts from the installation
     procedure.

See the DECtalk Software for Digital UNIX Software Product Description
(SPD) for additional hardware requirements.
---------------------------------------------------------------------------

 Software Requirements

DECtalk Software for Digital UNIX Version 4.2A requires:

   *  The Digital UNIX operating system Version 3.x or 4.0.

   *  The Multimedia Services for Digital UNIX Version 2.x.

   *  The Realtime extension.

   *  DECtalk Software Runtime subset V4.2A.

---------------------------------------------------------------------------

 Checking Current Disk Space

To check the current amount of free space for a directory path, log in to
the system where you will install DECtalk Software. You can check which
directories are mounted and their locations by viewing the /etc/fstab file.
For example:

# more /etc/fstab

/dev/rz3a / ufs rw 1 1

/dev/rz3g /usr ufs rw 1 2

/dev/rz3b swap1 ufs sw 0 2

The display indicates that /usr mounted on /dev/rz3g is the only mount
point that affects where DECtalk Software files will reside; the system has
only one local disk drive, and the /usr/opt file system resides in the g
partition of the disk on that drive.

To check the total space and the free space for the directories where
DECtalk Software will reside, enter the df command. Given the previous
display of the /etc/fstab) file, which shows that only /usr is a mount
point, you need to check free space only in the /usr file system. For
example:

# df /usr
Filesystem 512-blocks Used Avail Capacity Mounted on
/dev/rz3a 79608 45648 25998 64% /
/dev/rz3g 1482190 921846 412124 69% /usr

This display shows that there are 412124 kbytes free. This free space must
accommodate the subsets that you opt to install. If you choose to install
all the subsets in the DECtalk Software Development kit you will need
approximately 2 Mbytes of free disk space.
---------------------------------------------------------------------------

 Increasing Disk Space by Using Alternative Disks

The DECtalk Software installation procedure creates and loads files into
the sub directory:

/usr/opt/DTKDEV420

If any of the previously listed directories already exists, the
installation procedure uses it.

If you find that there is insufficient disk space for the DECtalk Software
subsets and you know that you have additional space on alternative disks or
disk partitions for your system, perform the following steps before
installing DECtalk Software:

  1. Log in as root

  2. Create the directory /usr/opt/DTKDEV420

  3. Specify in the /etc/fstab file that one or more of the newly created
     directories are mount points to new disk partitions where there is
     additional space.

  4. Enter the mount -a command so that the new mount points take effect.

---------------------------------------------------------------------------

 Installation Tasks

This section covers the tasks you must perform to install DECtalk Software.

Topics include:

   * Using the CD-ROM consolidated distribution media (see Using the CD-ROM
     Consolidated Distribution Media, page 21)

   * Responding to installation procedure prompts (see Starting the
     Installation Procedure, page 22)

   * Selecting subsets (see Selecting Subsets, page 23)

   * Using a RIS distribution area (see Using an RIS Distribution Area,
     page 21)

   * Starting the installation procedure (see Starting the Installation
     Procedure, page 22)

   * Stopping the installation (see Stopping the Installation, page 29)

---------------------------------------------------------------------------

 Using the CD-ROM Consolidated Distribution Media

The following procedure loads DECtalk Software files onto a disk belonging
to the system where you perform the installation. When DECtalk Software is
run, its executable images are mapped into memory on your system.

To install DECtalk Software from CD-ROM media:

  1. Mount the media on the appropriate disk drive.

  2. Log in as superuser login name root to the system where you will
     install DECtalk Software.

  3. Make sure that you are at the root (/) directory by entering the
     following command:

     # cd /

  4. Specify the /cdrom directory to be the mount point for the
     distribution file system on the drive. If your drive is rz4c, enter
     the following command:

     # mount -dr /dev/rz4c /cdrom

  5. Enter a setld) command that requests the load function -l and
     identifies the directory in the mounted file system where DECtalk
     Software subsets are located. For example, if the directory location
     for these subsets is /cdrom/DTK420/kit, enter the following command:

     # /usr/sbin/setld -l /cdrom/DTK420/kit

  6. The installation procedure now displays the names of DECtalk Software
     subsets and asks you to specify the subsets you want to load.

See Starting the Installation Procedure, page 22 to continue the
installation.
---------------------------------------------------------------------------

 Using an RIS Distribution Area

If you are installing DECtalk Software subsets that reside in an /etc/ris
RIS distribution area on a remote system, take the following steps:

  1. Log in as superuser login name root to the system where you will
     install DECtalk Software.

  2. Make sure that you are at the root directory (/) by entering the
     following command:

     # cd /

  3. Enter a setld command that requests the load function (-l) option and
     identifies the system where the DECtalk Software subsets are located.
     For example, if you are loading DECtalk Software subsets from a RIS
     distribution area on node axpmme, enter the following:

     /usr/bin/setld -l axpmme

  4. RIS now displays a menu that lists all the software subsets available
     to you and asks you to specify the subsets you want to load.

See Starting the Installation Procedure on page 22 to continue the
installation.
---------------------------------------------------------------------------

 Starting the Installation Procedure

Before starting the installation procedure,

  1. log in as a superuser and verify that you are at the root directory.
     Check to see if there are any previously installed DECtalk Software
     subsets by entering the following command:

     % su root

     # cd /

     # /usr/sbin/setld -i | grep DTKDEV

  2. Deinstall any installed subsets with the prefix DTKDEV by
     entering the following command:

     # cd /

     # /usr/sbin/setld -d (name of subset)

  3. To start the installation procedure, enter the following command:

     # /usr/sbin/setld -l /dev/rmt0h

  4. Then, respond to the installation procedure prompts as described in
     Selecting Subsets on page 23.

---------------------------------------------------------------------------

 Selecting Subsets

The following section presents a complete installation procedure, including
all messages that are displayed on your screen during the installation.

You must specify which DECtalk Software subsets you want to load. If you
specify more than one number at the prompt, separate each number with a
space, not a comma.

# setld -l .

The subsets listed below are optional:

There may be more optional subsets than can be presented on a single
screen. If this is the case, you can choose subsets screen by screen
or all at once on the last screen. All of the choices you make will
be collected for your confirmation before any subsets are installed.

1) DECtalk Software V4.2A for Digital UNIX Development Documentation.
2) DECtalk Software V4.2A for Digital UNIX Development Kit.
3) DECtalk Software V4.2A for Digital UNIX Release Notes.
4) DECtalk Software V4.2A for Digital UNIX Sample Programs.

Or you may choose one of the following options:

5) ALL of the above
6) CANCEL selections and redisplay menus
7) EXIT without installing any subsets

Enter your choices or press RETURN to redisplay menus.

Choices (for example, 1 2 4-6): 5

Next, the script lets you verify your choice. For example, if you enter 7
in response to the previous prompt, you will see the following display:
You are installing the following optional subsets:

DECtalk Software V4.2A for Digital UNIX Development Documentation.
DECtalk Software V4.2A for Digital UNIX Development Kit.
DECtalk Software V4.2A for Digital UNIX Release Notes.
DECtalk Software V4.2A for Digital UNIX Sample Programs.

Is this correct? (y/n): y

If the displayed subsets are not the ones you intended to choose, enter n.
In this case, the subset selection menu is displayed again and you can
correct your choice of optional subsets. If the displayed subsets are the
ones you want to load, enter y. After you respond to this question, the
rest of the installation proceedes automatically and all the selected
subsets are loaded. A sample of the rest of the installation script is
listed below.
Checking file system space required to install selected subsets:

File system space checked OK.

4 subset(s) will be installed.

Loading 1 of 4 subset(s)....

DECtalk Software V4.2A for Digital UNIX Development Documentation.
Copying from . (disk)
Verifying

Loading 2 of 4 subset(s)....

***********************************************************************
*                                                                     *
* DECtalk Software Application Services V4.2A                         *
* Development Subset                                                  *
*                                                                     *
* Copyright(c)Digital Equipment Corporation, 1996 All Rights          *
* Reserved                                                            *
*                                                                     *
* Unpublished rights reserved under the copyright laws of the United  *
* States. The software contained on this media is proprietary to      *
* and embodies the confidential technology of Digital Equipment       *
* Corporation. Possession, use, duplication or dissemination of the   *
* software and media is authorized only pursuant to a valid written   *
* license from Digital Equipment Corporation.                         *
*                                                                     * 
* RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the     *
* U.S. Government is subject to restrictions as set forth in          *
* Subparagraph (c)(1)(ii) of DFARS 252.227-7013, or in FAR 52.227-19, *
* or in FAR 52.227-14 Alt. III as applicable.                         *
*                                                                     *
***********************************************************************

DECtalk Software V4.2A for Digital UNIX Development Kit.
Copying from . (disk)
Verifying

Loading 3 of 4 subset(s)....

***********************************************************************
*                                                                     *
* DECtalk Software Application Services V4.2A                         *
* Sample Programs Subset                                              *
*                                                                     *
* Copyright(c)Digital Equipment Corporation, 1996 All Rights          *
* Reserved                                                            *
*                                                                     *
* Unpublished rights reserved under the copyright laws of the United  *
* States. The software contained on this media is proprietary to      *
* and embodies the confidential technology of Digital Equipment       *
* Corporation. Possession, use, duplication or dissemination of the   *
* software and media is authorized only pursuant to a valid written   *
* license from Digital Equipment Corporation.                         *
*                                                                     *
* RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the     *
* U.S. Government is subject to restrictions as set forth in          *
* Subparagraph (c)(1)(ii) of DFARS 252.227-7013, or in FAR 52.227-19, *
* or in FAR 52.227-14 Alt. III as applicable.                         *
*                                                                     *
***********************************************************************

DECtalk Software V4.2A for Digital UNIX Sample Programs.
Copying from . (disk)
Verifying

Loading 4 of 4 subset(s)....

DECtalk Software V4.2A for Digital UNIX Release Notes.
Copying from . (disk)
Verifying

4 of 4 subset(s) installed successfully.

***********************************************************************

DECtalk Software V4.2A development documentation subset (DTKDEVDOC420)

installation completed successfully. This installation puts the DECtalk
Software runtime documents in html format in the following directory
/usr/opt/DTKDEV420/docs/html.
You can use the Netscape browser to view the documents. Start by opening
the
file:
/usr/opt/DTKDEV420/docs/html/DtkDevGuideGuide.html

***********************************************************************

Configuring "DECtalk Software V4.2A for Digital UNIX Development
Documentation." (DTKDEVDOC420)

**************************************************************************

DECtalk Software V4.2A development subset (DTKDEV420) installation
completed successfully.

**************************************************************************

Configuring "DECtalk Software V4.2A for Digital UNIX Development Kit."
(DTKDEV420)

**************************************************************************

DECtalk Software V4.2A sample program subset (DTKSAMP420) installation
completed successfully. This installation puts the sample programs in
the following directory:

/usr/examples/dtk/dtsamples

**************************************************************************

Configuring "DECtalk Software V4.2A for Digital UNIX Sample Programs."
(DTKSAMP420)

****************************************************************************

DECtalk Software V4.2A development release notes subset
(DTKDEVRELNOT420)
installation completed successfully. This installation put DECtalk Software

development kit release notes in the following directories:

/usr/opt/DTKDEV420/docs/ascii and
/usr/opt/DTKDEV420/docs/postcript

****************************************************************************

Configuring "DECtalk Software V4.2A for Digital UNIX Release Notes."
(DTKDEVRELNOT420)

---------------------------------------------------------------------------

 Stopping the Installation

To stop the installation procedure at any time,

  1. enter Ctrl/C. Then, interactively delete the files created by the
     installation up to the point where you stopped the installation.
  2. The directories and files created during the DECtalk Software
     installation are listed in the following file:

     /usr/opt/DTKDEV420/docs/ascii/filelist.txt

     If you encounter any failures during installation, see Reporting
     Problems, page 32.

You may interrupt the installation procedure at any point. However, if you
do, the installation may not be left in a useful state. Remove all the
subsets you installed and reinstall them.

---------------------------------------------------------------------------
 Post-Installation Tasks

This section explains what you need to do after the installation to make
DECtalk Software ready for use. Topics include:

   *  Running the installation verification procedure (see Running the
     Installation Verification Procedure , page 30)

   *  Deleting DECtalk Software from your system (see Deleting DECtalk
     Software from Your System, page 30)

   *  Displaying documentation from the CD-ROM distribution disk. (see
     Displaying Documentation from the CD-ROM Distribution Disc, page 31)

   *  Solving problems during product installation (see Correcting Problems
     During Product Installation , page 31)

   *  Failures during product use. (see Reporting Problems, page 32)

---------------------------------------------------------------------------

 Running the Installation Verification Procedure

You can run the Installation Verification Procedure (IVP) during the
installation or you can run the IVP independently after installing DECtalk
Software to verify that the software is available on your system. You might
also want to run the IVP after a system failure to be sure that users can
access DECtalk Software.

To run the IVP command:

  1. % su root

     #/usr/sbin/setld -v DTKRT420

  2. The DECtalk Software IVP verifies the installation as follows:

        o A check for a valid LMF license is made. If no license is found,
          the IVP fails because the software cannot be tested.

        o DECtalk Software requires that the Multimedia Software for
          Digital UNIX server mmeserver be up and running. If the mmeserver
          is not already running then the IVP fails. Start the server and
          try again.
  3. To start the server follow the sequence shown below:

     % su root

     # mmeserver&

---------------------------------------------------------------------------

 Deleting DECtalk Software from Your System

If you must remove a version of DECtalk Software from your system, delete
each subset that you previously installed.

For example to delete a subset, do the following:

  1. as superuser login name root, as follows:

     % su root

  2. verify you are at the root directory (/) by entering the following
     command:

     # cd /

  3. Enter the following form of the setld) commands:

     # setld -i | grep DTK

  4. Look for the word installed in the listing produced, and then
  5. delete the installed subsets. For example:

     # setld -d DTKDEV420 DTKDEVDOC420

---------------------------------------------------------------------------

 Displaying Documentation from the CD-ROM Distribution Disc

The DECtalk Software documentation is provided on the Digital UNIX Layered
Products Online Documentation CD-ROM in Bookreader (.decw_book) file
format. You can display the Bookreader files on your workstation using the
DECwindows Bookreader application. For information on accessing and
displaying these files, see the Digital UNIX Layered Products Disc User's
Guide.
---------------------------------------------------------------------------

 Correcting Problems During Product Installation

If errors occur during the installation, the system displays failure
messages. For example, if the installation fails due to insufficient disk
space, a message similar to the following is displayed:

There is not enough space for subset SUBSET_NAME

SUBSET_DESCRIPTION (SUBSET_NAME) will not be loaded.

where:

SUBSET_NAME is the name of the subset
SUBSET_DESCRIPTION is the description of the subset

For example, "DTKDEVRELNOT420" is a subset name, and "DECtalk Software
for Digital UNIX Release Notes V4.2A" is a subset description.

Errors can occur during the installation if any of the following conditions
exist:

   * Operating system version is incorrect.

   * Prerequisite software version is incorrect.

   * Disk space is insufficient.

   * System parameter values for successful installation are insufficient.

For descriptions of error messages generated by these conditions, see the
Digital UNIX documentation on system messages, recovery procedures, and
software installation.
---------------------------------------------------------------------------

 Reporting Problems

If an error occurs while DECtalk Software is in use and you believe the
error is caused by a problem with the product, take one of the following
actions:

   * If you have a Software Product Services Support Agreement, contact
     your Customer Support Center (CSC) by telephone or by using the
     electronic means provided with your support agreement (such as
     DSNlink).

The CSC provides telephone support for high-level advisory and remedial
assistance. When you initially contact the CSC, indicate the following:

   * The name and version number of the operating system you are using

   * The version number of DECtalk Software you are using

   * The hardware system you are using (such as a model number)

   * A brief description of the problem (one sentence, if possible)

   * How critical the problem is

   * If you have a Self-Maintenance Software Agreement, you can submit a
     Software Performance Report (SPR).

   * If you do not have any type of software services support agreement and
     you purchased DECtalk Software within the past year, you can submit an
     SPR if you think the problem is caused by a software error.

When you submit an SPR, take the following steps:

   * Describe as accurately as possible the circumstances and state of the
     system when the problem occurred. Include the description and version
     number of the DECtalk Software being used. Explain the problem with
     specific examples.

   * Reduce the problem to its elements.

   * Remember to include listings of any command files, include files,
     relevant data files, and so forth.

   * Provide a listing of the program.

   * If the program is longer than 50 lines, submit a copy of it on
     machine-readable media (floppy diskette or magnetic tape). If
     necessary, also submit a copy of the program library used to build the
     application.

     For information about submitting media, see the tar(1) reference page.

   * Report only one problem per SPR. This will facilitate a faster
     response.

   * Mail the SPR package to Digital.

If the problem is related to DECtalk Software documentation, you can do one
of the following:

   * Report the problem to the CSC (if you have a Software Product Services
     Support Agreement and the problem is severe).

   * Fill out the Reader's Comments form (located at the back of the
     document that contains the error) and send the form to Digital. Be
     sure to include the action and page number where the error occurs.



---------------------------------------------------------------------------


Chapter 2:
Introduction to the DECtalk Software API



This chapter provides an introduction to the DECtalk Software
Text-To-Speech API services and a discussion of programming text-to-speech
applications using the API services.

Topics include:

   *  DECtalk Software Text-To-Speech Services

   *  Using the Text-To-Speech API

---------------------------------------------------------------------------

DECtalk Software Text-To-Speech Services

The Text-To-Speech API is a Digital extension to the multimedia API
specified by the MME services for the Digital UNIX operating system. The
API function set gives you a flexible method of manipulating the various
parameters of DECtalk Software functionality from within your application.
These functions perform a wide range of tasks associated with the
Text-To-Speech system and are listed by functional category in Table 1-1.

Table 1-1 -- Functions Listed by Category

Function                                    Purpose
Core API Functions
                   TextToSpeechStartup()  Initializes and starts up
                                          text-to-speech system.
                     TextToSpeechSpeak()  Speaks text from a buffer.
                  TextToSpeechShutdown()  Shuts down text-to-speech system.
Audio Output Control Functions
                     TextToSpeechPause()  Pauses output.
                    TextToSpeechResume()  Resumes output.
                     TextToSpeechReset()  text-to-speech System is purged
                                          and output stopped.
Blocking Synchronization Function
                      TextToSpeechSync()  Synchronizes to the text stream.
 Control and Status Functions
                TextToSpeechSetSpeaker()  Selects one of nine speaking
                                          voices.
                TextToSpeechGetSpeaker()  Returns the last speaking voice
                                          to have spoken.
                   TextToSpeechSetRate()  Sets the speaking rate of the
                                          text-to-speech system.
                   TextToSpeechGetRate()  Gets the speaking rate of the
                                          text-to-speech system.
               TextToSpeechSetLanguage()  Sets the language to be used.
               TextToSpeechGetLanguage()  Returns the language in use.
                 TextToSpeechGetStatus()  Gets status of text-to-speech
                                          System.
           TextToSpeechOpenWaveOutFile()  Opens a file for output. Text-To
                                          SpeechSpeak writes audio data in
                                          wave format to this file.
          TextToSpeechCloseWaveOutFile()  Closes the specified wave file.
               TextToSpeechOpenLogFile()  Opens a log File.
             TextToSpeechCloseLog File()  Closes a log File.
              TextToSpeechOpenInMemory()  Produces buffered speech samples
                                          in shared memory.
             TextToSpeechCloseInMemory()  Returns the text-to-speech
                                          system to its normal state.
                 TextToSpeechAddBuffer()  Adds a shared-memory buffer to
                                          the memory buffer list.
              TextToSpeechReturnBuffer()  Returns the current
                                          shared-memory buffer.
                   TextToSpeechGetCaps()  Retrieves the capabilities of
                                          the text-to-speech system.
Special Text-To-Speech Modes
 Loading and Unloading a User Dictionary
        TextToSpeechLoadUserDictionary()  Loads user dictionary.
      TextToSpeechUnloadUserDictionary()  Unloads user dictionary.

---------------------------------------------------------------------------

 Using the Text-To-Speech API

This section describes how to write application programs using the DECtalk
API. The DECtalk Software API can be called from within any C program on
the DIGITAL UNIX system. This API has been designed to be extensible for
future Text-To- Speech growth while still being easy to use. The current
DECtalk Software implementation supports only one instance of
Text-To-Speech per process. However, several copies of Text-To-Speech can
simultaneously be run as separate processes. However, several copies of the
text-to-speech system can be run as separate processes.

Core API Functions

The core Text-To-Speech API functions are the following:

*  TextToSpeechStartup() allocates system resources.

*  TextToSpeechSpeak() queues text to the system.

*  TextToSpeechShutdown() returns all system resources allocated by the
TextToSpeechStartup() function.

The simplest application might use only these functions.

About the TextToSpeechSpeak() Function

The TextToSpeechSpeak() function is used to pass a null terminated string
of characters to the Text-To-Speech system. The system queues all
characters up to the null character. If the TTS_FORCE flag is not used in
the call to the TextToSpeechSpeak() function, then the queued characters
are seamlessly concatenated with previously queued characters. The
TTS_FORCE flag is used to force a string of characters to be spoken even
though the string might not complete a clause. For example:

TextToSpeechSpeak("This will be spoken. ", TTS_NORMAL );

This text is spoken immediately by the system because it is terminated by a
period and a space. These last two characters are one way to create a
clause boundary.

TextToSpeechSpeak("This will be spok", TTS_NORMAL );

This produces output only after the following line of code executes to
complete the phrase.

TextToSpeechSpeak("en. ", TTS_NORMAL );

Finally, a nonphrase string can be forced to be spoken by using the
TTS_FORCE flag.

TextToSpeechSpeak("This will be spok", TTS_FORCE );

Note that the word spoken is not pronounced correctly in this case even if
the final characters in the word spoken, (en), are queued immediately
afterward.

The TTS_FORCE flag causes the previous line to be spoken before taking any
subsequently queued characters into account.

It is important that all sentences are separated with a space character. To
make sure of this, it is recommended that a space character is routinely
included after the final punctuation in a sentence. An example of what will
happen without this is shown below:

TextToSpeechSpeak("They are tired.", TTS_NORMAL ); TextToSpeechSpeak("I am
Cold.", TTS_NORMAL );

Because there is no space, the Text-To-Speech system processes the
following string:

"They are tired.I am Cold."

The string "tired.I" will be pronounced incorrectly because the system will
treat it as one item instead of two words.

Audio Output Control Functions

An application can control speech output using the TextToSpeechPause()
function, the TextToSpeechResume() function, and the TextToSpeechReset()
function. These functions pause the audio output, resume output after
pausing, and reset the Text-To-Speech system. A reset discards all queued
text, and stops and discards all queued audio. If the application has
called the TextToSpeechOpenInMemory() function to store speech samples in
memory, a reset causes all buffers to be returned to the application.

Blocking Synchronization Function

A special function called TextToSpeechSync() is provided to block until all
text previously queued by the TextToSpeechSpeak() function is spoken. Once
this function is called, there is no way to abort until all text is
processed. This could take hours if there is sufficient text queued.
Nonblocking synchronization can be provided using the index mark command.
See the Runtime User's Guide for more information on the index mark
command.
---------------------------------------------------------------------------

 Control and Status Functions

The functions described in the following table provide additional control
and status information for the Text-To-Speech system.

Table 1-2 -- Control and Status Functions

Function                                    Descriptions
TextToSpeechSetSpeaker()                    Sets the speaker's voice
                                            (which  becomes active at the
                                            next clause boundary).
TextToSpeechGetSpeaker()                    Returns the value of the last
                                            speaker to have spoken. This
                                            value cannot be the value
                                            previously set by the
                                            TextToSpeechSetSpeaker()
                                            function.
TextToSpeechSetRate()                       Sets the speaking rate, which
                                            becomes active at the next
                                            clause boundary.
TextToSpeechGetRate()                       Gets the speaking rate (the
                                            current rate setting is
                                            returned even if it has not
                                            been activated).
TextToSpeechSetLanguage()                   Sets the Text-To-Speech system
                                            language. (Currently, this
                                            must be TTS_AMERICAN_ENGLISH ).
TextToSpeechGetLanguage()                   Returns the current
                                            Text-To-Speech system language.
TextToSpeechGetStatus()                     Returns various Text-To-Speech
                                            system parameters, such as the
                                            number of characters in the
                                            text pipe, the ID of the wave
                                            output device, and a Boolean
                                            value that indicates whether
                                            the system is speaking or
                                            silent.
TextToSpeechGetCaps()                       Returns the capabilities of
                                            the Text-To-Speech system,
                                            which includes the version
                                            number of the system, the
                                            number of speakers, the
                                            maximum and minimum speaking
                                            rate, and the supported
                                            languages.

---------------------------------------------------------------------------

 Special Text-To-Speech Modes

After the TextToSpeechStartup() function is called by an application, it
can then call the TextToSpeechSpeak() function to speak text. The
application can also use the Text-To-Speech API to select different
modes.These modes allow for writing wave files; writing a log file, which
can contain text, phonemes, or syllables; or writing the audio (speech)
samples to memory. Each mode-switch function has a corresponding function
to return the Text-To-Speech system to the startup state. These functions
are listed below.

Open                                    Close
TextToSpeechOpenWaveOutFile             TextToSpeechCloseWaveOutFile()
TextToSpeechOpenLogFile()               TextToSpeechCloseLogFile()
TextToSpeechOpenInMemory()              TextToSpeechCloseInMemory()

The Text-To-Speech system must be in the startup state before calling any
of the Open functions listed above. The corresponding Close functions
return the system to the startup state.
---------------------------------------------------------------------------

 Loading and Unloading a User Dictionary

The TextToSpeechLoadUserDictionary() function is used to load a user
dictionary created with the userdic program. The
TextToSpeechUnloadDictionary() function is used to unload a user
dictionary.
---------------------------------------------------------------------------

 Creating a Wave File

After calling the TextToSpeechStartup() function, an ap- plication can call
the function TextToSpeechOpenWaveOutFile(). This function blocks until all
previously queued text has been processed. After the function returns, all
text subsequently queued by the function TextToSpeechSpeak() is converted
to speech and written into a wave file. Function
TextToSpeechCloseWaveOutFile() blocks until the speech from all previously
queued text is written to the file.
---------------------------------------------------------------------------

 Creating a Log File

After calling the TextToSpeechStartup() function, an application can call
the TextToSpeechOpenLogFile() function. This function blocks until all
previously queued text has been processed. After the function returns, all
text subsequently queued by the TextToSpeechSpeak() function is written to
a log file as either text, phonemes, or syllables. The phonemes and
syllables are written using the arpabet phoneme alphabet. The
TextToSpeechCloseLogFile() function terminates phoneme logging and blocks
until the speech from all previously queued text is processed.
---------------------------------------------------------------------------

 Storing Speech Samples in Memory

To cause all speech samples to be put in memory, the application must call
the TextToSpeechOpenInMemory() function. This function blocks until all
previously queued text has been processed. The memory buffers to store the
speech samples are supplied to the Text-To-Speech system by the
TextToSpeechAddBuffer() function. This function is used to pass a pointer
to a structure of type TTS_BUFFER_ T. (The TTS_BUFFER_T structure is
defined in the include file ttsapi.h.)

When a buffer is completed, the buffer is returned to the application by
sending a message to the callback function that corresponds to the callback
function passed to the TextToSpeechStartup() function. A pointer to the
returned TTS_BUFFER_T structure is contained in the LPARAM parameter of the
message. The user is responsible for the allocation and freeing of memory
for the following elements in the TTS_BUFFER_T structure: lpData, lpPhoneme
array, and lpIndex array.

The TTS_BUFFER_T structure is considered completed when any one of the
following conditions occurs:

o The sample buffer, which is pointed to by element lpData, is filled.

o The phoneme array is filled.

o The index mark array is filled.

o A TTS_FORCE is used in a call to the TextToSpeechSpeak() function.

The application must not modify any buffer passed to the Text- To-Speech
system by function TextToSpeechAddBuffer() until the buffer is returned
from the Text-To-Speech system in a message. The application then owns the
buffer. If no buffers are available, the system blocks. If the application
is processing relatively long passages of text, it is recommended that the
application queue several buffers and then requeue each buffer after
finishing with it so that the system is never idle.

A call to the TextToSpeechReset() function returns all buffers to the
application. The TextToSpeechReturnBuffer() function is supplied to force
the return of the current TTS_BUFFER_T structure, whether it is filled or
not. This function might not be required by most applications. It is
included so that an application can obtain the last buffer without forcing
that buffer to be sent with the TTS_FORCE command in the
TextToSpeechSpeak() function. This might be required, if the application
performs its own buffer management.

The TTS_BUFFER_T structure and its elements are defined as follows:

typedef struct TTS_PHONEME_TAG {

DWORD dwPhoneme;

DWORD dwPhonemeSampleNumber;

DWORD dwPhonemeDuration;

DWORD dwReserved;

} TTS_PHONEME_T;

typedef TTS_PHONEME_T * LPTTS_PHONEME_T;

typedef struct TTS_INDEX_TAG {

DWORD dwIndexValue;

DWORD dwIndexSampleNumber;

DWORD dwReserved;

} TTS_INDEX_T;

typedef TTS_INDEX_T * LPTTS_INDEX_T;

typedef struct TTS_BUFFER_TAG {

LPSTR lpData;

LPTTS_PHONEME_T lpPhonemeArray;

LPTTS_INDEX_T lpIndexArray;

DWORD dwMaximumBufferLength;

DWORD dwMaximumNumberOfPhonemeChanges;

DWORD dwMaximumNumberOfIndexMarks;

DWORD dwBufferLength;

DWORD dwNumberOfPhonemeChanges;

DWORD dwNumberOfIndexMarks;

DWORD dwReserved;

} TTS_BUFFER_T;

typedef TTS_BUFFER_T * LPTTS_BUFFER_T;

TTS_BUFFER_T Structure Initialization

The TTS_BUFFER_T structure and the elements of its lpData, lpPhonemeArray,
and lpIndexArray members point to must be allocated and freed by the user.
(Note that the last two pointers can be optionally set to NULL if they are
not used by the application.)

*  The lpData element points to a byte array. The dwMaximumBufferLength
must be set to the length of this array.

*  If the lpPhonemeArray element is set to NULL, then no phonemes are
returned. Otherwise, the lpPhonemeArray element must point to an
application- allocated array of structures of type TTS_PHONEME_ T. The
length of this array must be copied into the
dwMaximumNumberOfPhonemeChanges element.

*  If the lpIndexArray element is set to NULL, then no index marks are
returned. Otherwise, the lpIndexArray element must point to an
application-allocated array of structures of type TTS_INDEX_T. The length
of this ar- ray must be copied into the dwMaximumNumberOfIndexMarks
element.

TTS_BUFFER_T Return Values

When the TTS_BUFFER_T structure is returned to the application, it contains
the following return values:

*  The number of bytes of audio samples pointed to by the lpData element
are returned in the dwBufferLength element.

*  The number of phoneme changes contained in the array pointed to by the
lpPhonemeArray element are returned in the dwNumberOfPhonemeChanges
element.

*  The number of index marks contained in the array pointed to by the
lpIndexArray are returned in the dwNumberOfIndexMarks element.

The index and phoneme arrays each contain a time stamp in the form of a
sample number. This sample number is initialized at zero at startup and
after each call to the TextToSpeechReset() function. The phoneme array also
contains the current phoneme duration in frames. Each frame is
approximately 6.4 milliseconds.
---------------------------------------------------------------------------


Chapter 3:
DECtalk Software Sample Programs



This chapter provides instructions on how to build the sample programs.
Topics include:

   *  DECtalk Software Sample Programs

   *  Building DECtalk Software Sample Programs

---------------------------------------------------------------------------

 Sample Programs

Some applications are included with DECtalk Software. These sample
applications have been included to demonstrate the use of DECtalk Software
APIs. These sources can be used as templates for other applications that
you might want to develop. Sources to these programs can be found in:

/usr/examples/dtk/dtsamples

The samples and a brief description are listed below.

   *  xmsay.c and xmsay and its companion uil file xmsay.uil demonstrate
     the use of DECtalk Software APIs in the Motif windows environment.

   *  say.c This is a command line program that speaks out the text typed
     on the command line.

   *  mailtalk.c -- mailtalk announces the arrival of mail when new mail is
     received. The file mailtalk.ini in /usr/lib/dtk/ contains default
     announcement messages that mailtalk uses. To have mailtalk speak your
     own custom messages copy the mailtalk.ini file into your login
     directory and edit the strings.

   *  aclock.c -- Announces the time at specified intervals.

   *  dtmemory.c -- In dtmemory DECtalk Software passes back synthesized
     speech in buffers. These buffers are written out into a wave file.
     ----------------------------------------------------------------------
      Building the Sample Programs

     Sample programs can be created from the sources provided in
     /usr/examples/dtk/dtsamples. This section describes the procedure for
     building the sample programs. Before proceeding make sure that the
     DECtalk Software development kit has been installed. See the DECtalk
     Software Users Guide for more information on different components of
     DECtalk Software.

       1.  Create a local directory that you want to build he sample
          programs in.

       2.  Copy all the files in /usr/examples/dtk/dtsamples into the
          directory that you just created.

       3.  Generate a Makefile from the Imakefile by typing:

          /usr/bin/X11/xmkmf

       4.  Compile and link the sample application programs by typing the
          following while still in the directory that you just created:

          make all

       5.  After the make program completes successfully, the sample
          programs are ready to run.

          In addition to the sample programs you will also find some demo
          text files in your directory. These files demonstrate some of the
          DECtalk Software capabilities.
          -----------------------------------------------------------------
          

          Programming

          This section describes the DECtalk API programming environment.
          Topics include:

            1.  Header files

            2.  Shareable libraries

            3.  Compiling and linking applications

          Header Files

          DECtalk provides three header files that contain all the public
          data-structure definitions that the DECtalk Software API
          references. They are ttsapi.h, dtmmedefs.h, and engphon.h. When
          DECtalk Software is installed, these files are in
          /usr/include/dtk.

             +  ttsapi.h contains definitions of constants used in the
               DECtalk Software API calls, data structures that define the
               buffers that DECtalk Software returns, and the API function
               prototype definitions.

             +  dtmmedefs.h contains the basic data structure definitions
               used by DECtalk Software. It also contains definitions of
               error codes and audio formats. This file enables you to
               compile, link, and run certain DECtalk programs even if
               Multimedia Services for DIGITAL UNIX is not installed.
               Specifically, if you are writing an application program that
               does not use the audio drivers but want to use DECtalk
               Software to produce synthesized speech buffers (via the
               TextToSpeechInMemory calls), then using dtmmedefs.h
               circumvents the requirement for Multimedia Services for
               DIGITAL UNIX .

             +  engphon.h contains a list of American English Phoneme
               Codes.
          -----------------------------------------------------------------
          

          Shareable Libraries

          DECtalk Software APIs are available to programmers in two
          shareable libraries.

             +  libtts.so contains device independent DECtalk Software
               routines.

             +  libttsmme.so contains the DECtalk Software library that
               requires Multimedia Services for DIGITAL UNIX .

          As in the case of the header files, if you want to use DECtalk
          Software to write an application that produces buffers of
          synthesized speech, then the program is linked with libtts.so.
          If, on the other hand, you want to use the Multimedia Services
          for DIGITAL UNIX to communicate with the audio subsystem then the
          application has to be linked with libttsmme.so.
---------------------------------------------------------------------------

Chapter 4:
Customizing a DECtalk Software Voice



The DECtalk Software voices provide an adequate selection for most
applications. However, if you have a special application requiring a
monotone or unusual voice, you can modify the parameters provided in this
section to design your own voice.

   * Customizing a DECtalk Software Voice
        o Parameters [:dv_]
        o Changing Sex and Head Size
        o Changing Voice Quality
        o Changing Pitch and Intonation

---------------------------------------------------------------------------
 Parameters [:dv_]

The nine built-in voices of DECtalk are distinguished from one another by a
large set of speaker-definition parameters.

Speakers can differ in sex, age, head size and shape, larynx size and
behavior, pitch range, pitch and timing habits, dialect, and emotional
state. DECtalk Software cannot approximate all of these options. Therefore,
the space of distinguishable voices is limited, even though DECtalk
Software has many speaker-definition parameters that can be modified.

The design voice [:dv _] command introduces the speaker-definition
parameters that can be entered as a string or one at a time.

The following sections discuss speech production, acoustics, and
perception. Some of the information is relatively technical, but the
examples should make it possible for all developers to modify any parameter
effectively and listen to the results.
---------------------------------------------------------------------------

 Changing Sex and Head Size

Six speaker-definition parameters control the size and shape of the head.
These parameters are as follows are described later in this chapter.

sx    Sex 1 (male) or 0 (female)
hs    Head size, in %
f4    Fourth formant resonance frequency, in Hz
f5    Fifth formant resonance frequency, in Hz
b4    Fourth formant bandwidth, in Hz
b5    Fifth formant bandwidth, in Hz

Sex, sx

Male and female voices have many differences, including head size, pharynx
length, larynx mass, and speaking habits such as degree of breathiness,
liveliness of pitch, choice of articulatory target values, and speed of
articulation. Some of these differences are under the control of a single
parameter, sx, the sex of the speaker. Speakers Paul, Harry, Frank, and
Dennis are male (sx = 1), while speakers Betty, Rita, Ursula, Wendy, and
Kit are female (sx = 0). Actually, Kit the Kid can be male or female
because children younger than 10 years old have similar voices for both
sexes.

Changing the sx parameter causes DECtalk Software to access a different
(male or female) table of target values for formant frequencies,
bandwidths, and source amplitudes. The male and female tables are patterned
after two individuals who were judged to have pleasant, intelligible
voices. DECtalk Software's built-in voices are simply scaled
transformations of Paul and Betty, the two basic voices.

You can change the sex of any of DECtalk Software's voices by making the
voice current and then modifying the sx parameter. For example, the
following command gives Paul some of the speaking characteristics of a
woman. (The sx parameter does not change the average pitch or breathiness,
so a peculiar combination of simultaneous male and female traits results
from this sx change.)

[:np :dv sx 0] Am I a man or woman?

The sx parameter can also be specified as m or f with the commands [:dv sx
m] or [:dv sx f].

Note
If you change the sex of the voice, some phonemes might cause DECtalk
Software's filters to overload, producing a squawk. The modification of
certain parameters such as f4, f5, and g1 (explained in a later section)
can help to correct this problem.

Head Size, hs

Head size (hs) is specified as the average size for an adult man (if sx =
1) or an adult woman (if sx = 0). A head size of 100 % is normal or average
for a given sex, but people can differ significantly in this
characteristic. Head size has a strong influence on a person's voice. Large
musical instruments produce low notes, and humans with large heads tend to
have low, resonant voices. For example, to make Paul sound like a larger
man with a 15 % longer vocal tract (and formant frequencies that are scaled
down by a factor of about 0.85 %), use the following command:

[:np :dv hs 115] Do I sound more like Huge Harry this way?

Head size is one of the best variables to use if you want to make dramatic
voice changes. For example, Paul has a head size of 100, while Harry's deep
voice is caused in part by a head-size change to 115, or 15 % greater than
normal. Decreasing head size produces a higher voice, such as in a child or
adolescent. Extreme changes in head size, as in the following examples, are
somewhat difficult to understand.

[:nh :dv hs 135] Do I have a swelled head?

[:nk] I am about 10 years old.

[:nk :dv hs 65] Do I sound like a six year old?

Note

Extreme changes in head size can cause overloads, as well as difficulties
in understanding the speech. The modification of certain parameters such as
f4, f5, and g1 can help to correct this problem. (See the next section)

Higher Formants, f4, f5, b4, and b5

A male voice typically has five prominent resonant peaks in the spectrum
(over the range from 0 to 5 kHz), a female voice typically has only four
(because of a smaller head size), and a child has three. If fourth and
fifth formant resonances exist for a particular voice, they are fixed in
frequency and bandwidth characteristics. These characteristics are
specified (in HZ) by the parameters f4, f5, b4, and b5, in Hz.

If a higher formant does not exist, the frequency and bandwidth of the
speaker definition are set to special values that cause the resonance to
disappear. To make a resonance disappear, the frequency is set to above
5500 Hz and the bandwidth is set to 5500 Hz. (This disables the formant
filter.) This is what has been done to the fourth and fifth formants for
Kit.

The permitted values for f4 and f5 have fairly complicated restrictions.
Violating these restrictions can cause overloads and squawks. The Following
restrictions apply to cases where a higher formant exists:

F5 must be at least 300 Hz higher than f4.

If sx is 1 (male), f4 must be at least 3250 Hz.

If sx is 0 (female), f4 must be at least 3700 Hz.

If hs is not 100, the preceding values should be multiplied by (h/ 100).

These higher formants produce peaks in the spectrum that become more
prominent if b4 and b5 are smaller, and if f4 and f5 are closer together.
The limits placed on b4 and b5 should ensure that no problems occur.
However, smaller values for bandwidths may produce an overload in the
synthesizer. You can correct these overloads by increasing the bandwidths
or by changing the gain control g1.

---------------------------------------------------------------------------

 Changing Voice Quality

Six speaker-definition parameters control aspects of the output of the
larynx, which, in turn, control voice quality. These parameters are listed
as follows:

br    Breathiness, in decibels (dB)
lx    Lax breathiness, in %
sm    Smoothness, in %
ri    Richness, in %
nf    Number of fixed samples of open glottis
la    Laryngealization, in %

Breathiness, br

Some voices can be characterized as breathy. The vocal folds vibrate to
generate voicing and breath noise simultaneously. Breathiness is a
characteristic of many female voices, but it is also common under certain
circumstances for male voices.

The range of the br parameter is from 0 dB (no breathiness) to 70 dB
(strong breathiness). By experimenting, you can learn what intermediate
values sound like. For example, to turn Paul into a breathy, whispering
speaker, use the following command:

[:np :dv br 55 gv 56] Do I sound more like Dennis now?

This voice is not as loud as the others because of the simultaneous
decrease in the gain of voicing, (gv), but it is intelligible and human
sounding.

Lax Breathiness, lx

The br parameter creates simultaneous breathiness whenever voicing is
turned on. Another type of breathiness occurs only at the ends of sentences
and when going from voiced to voiceless sounds. This type of "lax"
breathiness is controlled by the lx parameter in %.

A nonbreathy, tense voice would have lx set to 0, while a maximally
breathy, lax voice would have lx set to 100. The difference between these
two voices is not great, but you can hear it if you listen closely.

Smoothness, sm

Smoothness refers to vocal fold vibrations. The vocal folds meet at the
midline, as they do in normal voicing, but they do not slam together
forcefully to create a very sudden cessation of airflow.

DECtalk Software uses a variable-cutoff, gradual low-pass filter to model
changes to smoothness. The range of sm is from 0 % (least smooth and most
brilliant) to 100 % (most smooth and least brilliant). The voicing source
spectrum is tilted so that energy at higher frequencies is attenuated by as
much as 30 dB when sm is set to the maximum but is not attenuated at all
when sm is set to 0.

Professional singing voices that are trained to sing above an orchestra are
usually brilliant, while anyone who talks softly becomes breathy and
smooth. To synthesize a breathy voice, an sm value of about 50 or more is
good. Changes to sm do not have a great effect on perceived voice quality.

Richness, ri

Richness is similar to smoothness and brilliance except that the spectral
change occurs at lower frequencies and is because of a different
physiological mechanism. Brilliant, rich voices carry well and are more
intelligible in noisy environments, while smooth, soft voices sound more
friendly. For example, the following command produces a soft, smooth
version of Paul's voice:

[:np :dv ri 0 sm 70] Do I sound more mellow?

The following command produces a maximally rich and brilliant (forceful)
voice:

[:np :dv ri 90 sm 0] Do I sound more forceful?

Smoothness and richness are usually negatively correlated when a speaker
dynamically changes laryngeal output. The sm and ri parameters do not
influence the speaker's identity very much.

Nopen Fixed, nf

The number of samples in the open part of the glottal cycle is determined
not only by ri, but also by a second parameter, nf. The nf parameter is the
number of fixed samples in the open portion of the glottal cycle.

Most speakers adjust the open phase to be a certain fraction of the period,
and this fraction is determined by ri. Other speakers keep the open phase
fixed in duration when the overall period varies. To simulate this
behavior, set ri to 100 and adjust nf to the desired duration of the open
phase. The shortest possible open phase is 10 (1 ms), and the longest is
three quarters of the period duration (about 70 for a male voice).

Laryngealization, la

Many speakers turn voicing on and off irregularly at the beginnings and
ends of sentences, which gives a querulous tone to the voice. This
departure from perfect periodicity is called laryngealization or creaky
voice quality.

The la parameter controls the amount of laryngealization, in the voice. A
value of 0 results in no laryngealized irregularity, and a value of 100
(the maximum) produces laryngealization at all times. For example, to make
Betty moderately laryngealized, type the following command:

[:nb :dv la 20]

The la parameter creates a noticeable difference in the voice, although it
is not altogether a pleasant change.
---------------------------------------------------------------------------

 Changing Pitch and Intonation

Seven speaker-definition parameters control aspects of the fundamental
frequency (f0) contour of the voice. These parameters are as follows and
are described in the chapter on modifying voices.

bf Baseline fall, in Hz

hr Hat rise, in Hz

sr Stress rise, in Hz

as Assertiveness, in %

qu Quickness, in %

ap Average pitch, in Hz

pr Pitch range, in %

Baseline Fall, bf

The bf parameter in Hz determines one aspect of the dynamic fundamental
frequency contour for a sentence. If bf is 0, the reference baseline
fundamental frequency of a sentence begin and ends at 115 Hz. All
rule-governed dynamic swings in f0 are computed with respect to the
reference baseline.

Some speakers begin a sentence at a higher f0 and gradually fall as the
sentence progresses. This "falling baseline" behavior can be simulated by
setting bf to the desired fall in Hz. For example, setting bf to 20 Hz
causes the f0 pattern for a sentence to begin at 125 Hz (115 Hz plus half
of bf) and to fall at a rate of 16 Hz per second until it reaches 105 Hz
(115 Hz minus half of bf). The baseline remains at this lower value until
it is reset automatically before the beginning of the next full sentence
(right after a period, question mark, or exclamation point). The rate of
fall (16 Hz per second) is fixed, regardless of the extent of the fall.

Whenever you include a [+] phoneme in the text to indicate the beginning of
a paragraph, the baseline is automatically set to begin slightly higher for
the first sentence of the paragraph. While baseline fall differs among the
speakers, it is not a good cue for differentiating among them. As long as
the fall is not excessive, its presence or absence is hardly noticeable.

Hat Rise, hr, and Stress Impulse Rise, sr

The hr (nominal hat rises in Hz) and sr (nominal stress impulse rises in
Hz) parameters determine aspects of the dynamic fundamental frequency
contour for a sentence. To modify these values selectively, you should
understand how the f0 contour is computed as a function of lexical stress
pattern and syntactic structure of the sentence.

A sentence is first analyzed and broken into clauses with punctuation and
clause-introducing words to determine the locations of clause boundaries.
Within each clause, the f0 contour rises on the first stressed syllable,
stays at a high level for the remainder of the clause up to the last
stressed syllable, and falls dramatically on the last stressed syllable.
This rise-at-the-beginning and fall-at-the-end pattern has been called the
"hat pattern" by linguists, using the analogy of jumping from the brim of a
hat to the top of the hat and back down again.

The hr parameter indicates the nominal height, in Hz of a pitch rise to a
plateau on the first stress of a phrase. A corresponding pitch fall is
placed by rule on the last stress of the phrase. Some speakers use
relatively large hat rises and falls, while others use a local
"impulse-like" rise and fall on each stressed syllable. The default hr
value for Paul is 22 Hz, indicating that the f0 contour rises a nominal 22
Hz when going from the brim to the top of the hat. To simulate a speaker
who does not use hat rises and falls, use the command:

[:dv hr 0].

Other aspects of the hat pattern are important for natural intonation but
are not accessible by speaker-definition commands. For example, the hat
fall becomes a weaker fall followed by a slight continuation rise if the
clause is to be succeeded by more clauses in the same sentence. Also, if
unstressed syllables follow the last stressed syllable in a clause, part of
the hat fall occurs on the very last (unstressed) syllable of the clause.
If the clause is long, DECtalk Software may break it into two hat patterns
by finding the boundary between the noun phrase and the verb phrase.

If DECtalk Software is in phoneme input mode and you use the pitch rise [/]
and pitch fall [\] symbols, the hr parameter determines the actual rise and
fall in Hz.

Stress Rise, sr

The sr parameter indicates the nominal height, in Hz, of a local pitch rise
and fall on each stressed syllable. This rise-fall is added to any hat rise
or fall that is also present. For example, Paul has pr set to 32 Hz,
resulting in an f0 rise-fall gesture of 32 Hz over a span of about 150 ms,
which is located on the first and succeeding stressed syllables. However,
DECtalk Software rules reduce the actual height of successive stress rises
and falls in each clause and cause the last stress pulse to occur early so
that there is time for the hat fall during the vowel.

If the sr parameter is set too low, the speech sounds monotone within long
phrases. Great changes to hr and sr from their default values for each
speaker are not necessary or desirable, except in unusual circumstances.

Assertiveness, as

Assertive voices have a dramatic fall in pitch at the end of utterances.
Neutral or meek speakers often end a sentence with a slight "questioning"
rise in pitch to deflect any challenges to their assertions. The as
parameter, in %, indicates the degree to which the voice tends to end
statements with a conclusive final fall. A value of 100 is very assertive,
while a value of 0 is extremely meek.

uickness, qu

The qu parameter, in %, controls the speed of response to a request to
change the pitch. All hat rises, hat falls, and stress rises can be thought
of as suddenly applied commands to change the pitch, but the larynx is
sluggish and responds only gradually to each command. A smaller larynx
typically responds more quickly, so while Harry has a quickness value of
10, Kit has a value of 50.

In engineering terms, a value of 10 implies a time constant (time to get to
70 % of a suddenly applied step target) of about 100 ms. A value of 90 %
corresponds to a time constant of about 50 ms. Lower quickness values may
mean that the f0 never reaches the target value before a new command comes
along to change the target.

Average Pitch, ap, and Pitch Range, pr

The ap (average pitch, in Hz) and pr (pitch ranges in % of normal range)
parameters modify the computed values of fundamental frequency, f0,
according to the formula:

f0' = ap + (((f0 - 120) * pr) / 100)

If ap is set to 120 Hz and pr to 100 %, there will be no change to the
"normal" f0 contour that is computed for a typical male voice. The effect
of a change in ap is simply to raise or lower the entire pitch contour
independently by a constant number of Hz, whereas the effect of pr is to
expand or contract the swings in pitch about 120 Hz.

Normally, a smaller larynx simultaneously produces f0 values that are
higher in average pitch and higher in pitch range by about the same factor
(the whole f0 contour is multiplied by a constant factor). Observing the
values assigned to ap and pr for each of the voices, you can see that the
voices rank in average pitch from low (Harry) to high (Kit).

Rankings for pr are similar, except that Frank has a flat, nonexpressive
pitch range as compared with his average pitch.

The best way to determine a good pitch range for a new voice is by trial
and error. You can create a monotone or robotlike voice by setting the
pitch range to 0. For example, to make Harry speak in a monotone at exactly
90 Hz, type the following command.

[:nh :dv ap 90 pr 0] I am a robot.

Reducing the pitch range reduces the dynamics of the voice, producing
emotions such as sadness in the speaker. Increasing the pitch range while
leaving the average pitch the same or setting it slightly higher suggests
excitement.

Due to constraints involved in pitch-synchronous updating of other
dynamically changing parameters, the fundamental frequency contour that is
computed by the preceding formula is then checked for values that are
outside the following limits.

f0 maximum = 500 Hz

f0 minimum = 50 Hz

Any value outside this range is limited to fall within the range.

To keep you from exceeding reasonable limits on the parameters that control
pitch, certain constraints apply to the values selected. If a [:dv _]
command specifies values outside these limits, the value is limited to the
nearest listed value before execution.

Changing Relative Gains and Avoiding Overloads Eight speaker-definition
parameters control the output levels of various internal resonators. These
parameters are:

gv     Gain of voicing source, in dB
  gh   Gain of aspiration source, in dB
gf     Gain of frication source, in dB
gn     Gain of nasalization, in dB
g1     Gain of cascade formant resonator 1, in dB
g2     Gain of cascade formant resonator 2, in dB
g3     Gain of cascade formant resonator 3, in dB
g4     Gain of cascade formant resonator 4, in dB
g5     Loudness of the voice, in dB

Loudness, g5

Each predefined voice has been adjusted to have about the same perceived
loudness -- a value that is optimal for telephone conversation. The value
chosen is near maximum. (If loudness were increased much, some phonemes
would probably cause an overload squawk.) A near-maximum value was selected
to maximize the signal-to-noise level of DECtalk Software.

If you want to decrease the loudness of a voice or temporarily increase a
phrase that is known not to overload, determine the g5 value in dB for the
voice in question. Then adjust the voice by using the following command:

[:np :dv g5 76] I am speaking at about half my normal level.

Because the g5 entry for Paul is 86, this command reduces loudness by 10
dB. Perceived loudness approximately doubles (or halves) for each 10 dB
increment (or decrement) in g5.

Software control over loudness is useful in a loudspeaker application where
the background noise level in the room might change. For example, a vocally
handicapped, wheelchair-bound person does not want to appear to be shouting
in a quiet interpersonal conversation, but he or she may want to be able to
converse in a noisy room as well. Using a software abbreviation facility,
such a person could type "lo" to select a command making the voice
maximally loud, or "sof" to invoke a command setting lo to a reduced value.

Note
DECtalk Software comes with volume control so that modification of the g5
parameter should not be necessary. Using the [:volume ...] command or the
volume control knob on the external loudspeaker is recommended.

Sound Source Gains, gv, gh, gf, and gn

Several types of sound sources are activated during speech production:
voicing, aspiration, frication, and nasalization. The relative output
levels of these sounds, in dB, are determined by the gv, gh, gf and gn
parameters, respectively. The default settings for these parameters have
been factory preset to maximize the intelligibility of each voice. However,
changing the settings can be useful in debugging the system or in
demonstrating aspects of the acoustic theory of speech production. You can
change the level of one sound source globally, for example, turn off
frication to be able to hear just the output of the larynx. You might need
to reduce these parameters to overcome certain kinds of overloads, but try
the procedure described in the next section first.

Cascade Vocal Tract Gains, g1, g2, g3, and g4

Changes in head size or other parameters can sometimes produce overloads in
the synthesizer circuits. If this occurs, make sure that f4 and f5 are set
to reasonable values. If the squawk remains, you can adjust several gain
controls -- g1 through g4, in dB -- in the cascade of formant resonators of
the synthesizer to attenuate the signal at critical points. These gains can
then be amplified back to desired output levels later in the synthesis.

Use the following procedure to correct an overload (typically indicated by
a squawk during part of a word):

Synthesize the word or phrase several times to make sure the squawk occurs
consistently. Use the same test word each time a change to a gain is made.

Determine the default values for g1 through g4 for the speaker that
overloads.

Reduce g1 by increments of 3 until the squawk goes away. When the squawk
goes away, note the reduction that was needed. If more than a 10 dB
decrement is required, some other parameter has probably been changed too
much. If the squawk does not go away at all, then you may need to reduce gv
instead of g1.

Increase g5 to return the output to its original level. For example, if g1
was reduced by 6 dB, add 6 dB to lo (or to g4 if lo is already at a
maximum). If incrementing lo causes the squawk to return, then decrease lo
slowly until the squawk goes away.

This procedure works in most cases, but using g2 rather than g1 can work
better. If you can return g1 to its factory-preset value and reduce g2
instead to make the squawk go away, then the signal-to-quantization-noise
level in g1 remains maximized. If you can eliminate the squawk by using g3
or g4 rather than g2, more of the cascaded resonator system can be made
immune to quantization noise accumulation.

The [save] Parameter and [:nv] Voice

You can save a modified speaker definition in a buffer while synthesizing
speech with one of the other voices. The Val voice [:nv] is either male or
female, depending on what values are stored in the buffer. If you call Val
before storing any values in the buffer, DECtalk Software uses the Perfect
Paul voice [:np]. The following commands store a modified Betty voice in
Val and then recall it.

[:nb :dv sex m save ]

(Store the modified Betty voice in Val.)

[:np] I am Paul.

(Use another voice.)

[:nv] I am Val.

(Recall the Val [modified-Betty] voice.)

The buffer holds its contents until you power down DECtalk Software. You
must reenter new voice characteristics if you turn off DECtalk Software.

Note
If you want to use the save command, leave a space between the command and
the trailing bracket; for example, [:dv save ].

Summary on Speaker-Definition Parameters

Of the 27 parameters, only a few cause dramatic changes in the voice. The
greatest effects are obtained with changes to hs, ap, pr, and sx, while
moderate changes occur when modifying la and br. To some extent, DECtalk
Software's nine predefined speakers cover most of the possible voices, so
don't expect to be able to find a voice that is highly novel and
intelligible. However, you might easily find ways to improve one of the
standard voices slightly.
---------------------------------------------------------------------------

Chapter 5:
DECtalk Software API Function Calls



This chapter is an alphabetical listing of DECtalk Software API functions.
They include:

   * Control and Status Fuctions
   * Text-to-Speech Modes
   * Text-to-Speech Functions: Alphabetical Listing
   * Function Listed by Category
   * TextToSpeechAddBuffer
   * TextToSpeechCloseInMemory
   * TextToSpeechCloseLogFile
   * TextToSpeechCloseWaveOutFile
   * TextToSpeechGetCaps
   * TextToSpeechGetLanguage
   * TextToSpeechGetRate
   * TextToSpeechGetSpeaker
   * TextToSpeechGetStatus
   * TextToSpeechLoadUserDictionary
   * TextToSpeechOpenInMemory
   * TextToSpeechOpenLogFile
   * TextToSpeechOpenWaveOutFile
   * TextToSpeechPause
   * TextToSpeechReset
   * TextToSpeechResume
   * TextToSpeechReturnBuffer
   * TextToSpeechSetLanguage
   *  TextToSpeechSetRate
   * TextToSpeechSetSpeaker
   * TextToSpeechShutdown
   * TextToSpeechSpeak
   * TextToSpeechStartup
   * Loading of the Main Pronunciation Dictionary
   * Loading of the User Dictionary
   * TextToSpeechSync
   * TextToSpeechUnloadUserDictionary

Conventions used in API functions

bold          Bold text is used to indicate function
              names, data structures, and field names.

italics       Italic text is used to indicate function
              arguments and to emphasize important
              information.

---------------------------------------------------------------------------

 Control and Status Fuctions

The functions described in the following table provide additional control
and status information for the text-to-speech system.

Function                  Descriptions
TextToSpeechSetSpeaker()  Sets the speaker's voice
                          (which  becomes active at the
                          next clause boundary).
TextToSpeechGetSpeaker()  Returns the value of the last
                          speaker to have spoken. This
                          value cannot be the value
                          previously set by the
                          TextToSpeechSetSpeaker()
                          function.
TextToSpeechSetRate()     Sets the speaking rate, which
                          becomes active at the next
                          clause boundary.
TextToSpeechGetRate()     Gets the speaking rate (the
                          current rate setting is
                          returned even if it has not
                          been activated).
TextToSpeechSetLanguage(  Sets the text-to-speech
)                         system language. (Currently,
                          this must be
                          TTS_AMERICAN_ENGLISH .
TextToSpeechGetLanguage(  Returns the current
)                         text-to-speech system
                          language.
TextToSpeechGetStatus()   Returns various
                          text-to-speech system
                          parameters, such as the
                          number of characters in the
                          text pipe, the ID of the wave
                          output device, and a Boolean
                          value that indicates whether
                          the system is speaking or
                          silent.
TextToSpeechGetCaps()     Returns the capabilities of
                          the text-to-speech system,
                          which includes the version
                          number of the system, the
                          number of speakers, the
                          maximum and minimum speaking
                          rate, and the supported
                          languages.

---------------------------------------------------------------------------

 Text-to-Speech Modes

After the TextToSpeechStartup() function is called by an application, it
can then call the TextToSpeechSpeak() function to speak text. The
application can also use the text-to-speech API to select different modes.
These modes allow for writing wave files; writing a log file, which can
contain text, phonemes, or syllables; or writing the audio (speech) samples
to memory. Each mode-switch function has a corresponding function to return
the text-to-speech system to the startup state. These functions are listed
below.

Open                        Close
TextToSpeechOpenWaveOutFile TextToSpeechCloseWaveOutFile()

TextToSpeechOpenLogFile()   TextToSpeechCloseLogFile()
TextToSpeechOpenInMemory()  TextToSpeechCloseInMemory()

The text-to-speech system must be in the startup state before calling any
of the Open functions listed above. The corresponding Close functions
return the system to the startup state.

---------------------------------------------------------------------------


Text-to-Speech Functions: Alphabetical Listing

TextToSpeechAddBuffer()

TextToSpeechCloseInMemory()

TextToSpeechCloseLog File()

TextToSpeechCloseWaveOutFile()

TextToSpeechGetCaps()

TextToSpeechGetLanguage()

TextToSpeechGetRate()

TextToSpeechGetSpeaker()

TextToSpeechGetStatus()

TextToSpeechLoadUserDictionary()

TextToSpeechOpenInMemory()

TextToSpeechOpenLogFile()

TextToSpeechOpenWaveOutFile()

TextToSpeechPause()

TextToSpeechReset()

TextToSpeechResume()

TextToSpeechReturnBuffer()

TextToSpeechSetLanguage()

TextToSpeechSetRate()

TextToSpeechSetSpeaker()

TextToSpeechShutdown()

TextToSpeechSpeak()

TextToSpeechStartup()

TextToSpeechSync()

TextToSpeechUnloadUserDictionary()
---------------------------------------------------------------------------


Function Listed by Category

                TextToSpeechStartup()  Initializes and
                                       starts up
                                       text-to-speech
                                       system.
                  TextToSpeechSpeak()  Speaks text
                                       from a buffer.
               TextToSpeechShutdown()  Shuts down
                                       text-to-speech
                                       system.

Function                                Purpose
Core API Functions
Audio Output Control Functions
                  TextToSpeechPause()  Pauses output.
                 TextToSpeechResume()  Resumes output.
                  TextToSpeechReset()  text-to-speech
                                       System is
                                       purged and
                                       output stopped.
Blocking Synchronization Function
                   TextToSpeechSync()  Synchronizes to
                                       the text stream.
 Control and Status Functions
             TextToSpeechSetSpeaker()  Selects one of
                                       nine speaking
                                       voices.
             TextToSpeechGetSpeaker()  Returns the
                                       last speaking
                                       voice to have
                                       spoken.
                TextToSpeechSetRate()  Sets the
                                       speaking rate
                                       of the
                                       text-to-speech
                                       system.
                TextToSpeechGetRate()  Gets the
                                       speaking rate
                                       of the
                                       text-to-speech
                                       system.
            TextToSpeechSetLanguage()  Sets the
                                       language to be
                                       used.
            TextToSpeechGetLanguage()  Returns the
                                       language in use.
              TextToSpeechGetStatus()  Gets status of
                                       text-to-speech
                                       System.
 TextToSpeechOpenWaveOutFile()  Opens a file for
                                output. Text-To
                                SpeechSpeak writes
                                audio data in wave
                                format to this file.
TextToSpeechCloseWaveOutFile()  Closes the specified
                                wave file.
     TextToSpeechOpenLogFile()  Opens a log File.
   TextToSpeechCloseLog File()  Closes a log File.
    TextToSpeechOpenInMemory()  Produces buffered
                                speech samples in
                                shared memory.
   TextToSpeechCloseInMemory()  Returns the
                                text-to-speech system
                                to its normal state.
       TextToSpeechAddBuffer()  Adds a shared-memory
                                buffer to the memory
                                buffer list.
    TextToSpeechReturnBuffer()  Returns the current
                                shared-memory buffer.
                TextToSpeechGetCaps()  Retrieves the
                                       capabilities of
                                       the
                                       text-to-speech
                                       system.
Special Text-To-Speech Modes
 Loading and Unloading a User Dictionary
TextToSpeechLoadUserDictionary  Loads user dictionary.
                            ()
TextToSpeechUnloadUserDictionary()  dictionary.  Unloads user


---------------------------------------------------------------------------

 TextToSpeechAddBuffer

This function adds a buffer to the memory list the application uses in the
speech-to-memory mode.

Syntax

MMRESULT TextToSpeechAddBuffer    (LPTTS_HANDLE_T
                                  phTTS,
LPTTS_BUFFER_T
                                  pTTSbuffer)

Parameters

LPTTS_HANDLE_T phTTS        A pointer to a structure of
                            type TTS_HANDLE_T.
LPTTS_BUFFER_T pTTSbuffer   A pointer to a structure of
                            type TTS_BUFFER_T.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

         Constant           Description
MMSYSERR_NOERROR            Normal successful
                            completion.
MMSYSERR_INVALPARAM         Invalid parameter.
MMSYSERR_ERROR              Output to memory not
                            enabled or unable to create
                            a system object.
MMSYSERR_INVALHANDLE        The text-to-speech handle
                            was invalid.

Comments

The application must have previously called the TextToSpeechOpenInMemory()
function before calling this function. The buffer is passed using the
structure TTS_BUFFER_T . The user must allocate the structure and the
memory buffer. The text-to-speech system returns the buffer to the
application when the buffer is full.

The structure of type TTS_BUFFER is returned to the application in a
message to the window procedure that corresponds to the window handle
passed to the TextToSpeechStartup() function. A pointer to the structure of
the type TTS_BUFFER_T is in the LPARAM field of the message. The message ID
value is obtained with the following call:

uiID_Buffer_Message = RegisterWindowMessage("DECtalkBufferMessage");

See the topic, Storing Speech Samples in Memory

See Also

TextToSpeechOpenInMemory()

TextToSpeechReturnBuffer()

Storing Speech Samples in Memory

Asynchronous Messages
---------------------------------------------------------------------------

 TextToSpeechCloseInMemory

This function terminates the text-to-speech system's speech-to-memory
capability and returns the text-to-speech system to its startup state. If
audio is enabled at startup, then speech samples are routed to the audio
device.

Syntax

MMRESULT TextToSpeechCloseInMemory  (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T phTTS        A pointer to a
                            text-to-speech handle.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                    Description
MMSYSERR_NOERROR            Normal successful completion.
MMSYSERR_ERROR              Output to memory not enabled
                            or unable to create a system
                            object.
MMSYSERR_INVALHANDLE        The text-to-speech handle
                            was invalid.

Comments

The TextToSpeechOpenInMemory() function must be called before calling this
function.

See Also

TextToSpeechOpenInMemory()
---------------------------------------------------------------------------

 TextToSpeechCloseLogFile

This function closes a log file opened by the TextToSpeechOpenLogFile()
function.

Syntax

MMRESULT TextToSpeechCloseLogFile   (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T phTTS          A pointer to a
                              text-to-speech handle.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants.

Constants                     Description
MMSYSERR_NOERROR              Normal successful
                              completion.
MMSYSERR_ERROR                Failure to wait for
                              pending speech,  unable to
                              close the output file, or
                              no output file is open.
MMSYSERR_INVALHANDLE          The text-to-speech handle
                              was invalid.

Comments

This function, when called, closes any open log file, even if it was opened
with the Log [:log] voice-control command. The application must have
previously called the TextToSpeechOpenLogFile() function before calling
this function.

See Also

TextToSpeechOpenLogFile()

---------------------------------------------------------------------------

 TextToSpeechCloseWaveOutFile

This function closes a wave file opened by the
TextToSpeechOpenWaveOutFile() function.

Syntax

MMRESULT                               (LPTTS_HANDLE_T phTTS)
TextToSpeechCloseWaveOutFile

Parameters

 LPTTS_HANDLE_T phTTS
      Specifies a text-to-speech
                            handle identifying the
                            opened text-to-speech
                            device.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                    Description
MMSYSERR_NOERROR            Normal successful
                            completion.
MMSYSERR_ERROR              Failure to wait for pending
                            speech. Unable to update
                            wave file header. Unable to
                            close the wave file.
MMSYSERR_INVALHANDLE        The text-to-speech handle
                            was invalid.

Comments

The application must have previously called the
TextToSpeechOpenWaveOutFile() function before calling this function.

See Also

TextToSpeechOpenWaveOutFile()
---------------------------------------------------------------------------

 TextToSpeechGetCaps

This function provides the capabilities of the text-to-speech system by
filling in a structure of type TTS_CAPS_T. The caller must have space
allocated for this structure before calling this function.

Syntax

MMRESULT TextToSpeechGetCaps   (LPTTS_CAPS_T lpTTScaps)

Parameters

LPTTS_CAPS_T lpTTScaps
     A pointer to a structure of
                            type TTS_CAPS_T . This
                            structure returns the
                            capabilities of the
                            text-to-speech system.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                    Description
MMSYSERR_NOERROR            Normal successful completion.
MMSYSERR_INVALHANDLE        The text-to-speech handle
                            was invalid.
MMSYSERR_ERROR              The pointer to the
                            TTS_CAPS_T  structure was
                            invalid.

Comments

Information returned in the TTS_CAPS_T structure includes languages and
proper-name pronunciation support, sample rate, minimum and maximum
speaking rate, number of predefined speaking voices, character-set
supported, and version number.
---------------------------------------------------------------------------

 TextToSpeechGetLanguage

This function returns the current language.

Syntax

MMRESULT TextToSpeechGetLanguage  (LPTTS_HANDLE_T
                                  phTTS,
LANGUAGE_T pLanguage)

Parameters

LPTTS_HANDLE_T phTTS
       Specifies a text-to-speech
                            handle identifying the
                            opened text-to-speech
                            device.
LANGUAGE_T * pLanguage      Specifies a language from
                            the following list:

Constant                    Description
TTS_AMERICAN_ENGLISH        Specifies American English.
                            Currently, American English
                            is the only supported
                            language (defined in include
                            file ttsapi.h).

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                    Description
MMSYSERR_NOERROR            Normal successful completion.
MMSYSERR_INVALHANDLE        The text-to-speech handle
                            was invalid.

See Also

TextToSpeechSetLanguage()
---------------------------------------------------------------------------

 TextToSpeechGetRate

This function returns the current setting of the speaking rate.

Syntax

MMRESULT TextToSpeechGetRate            (LPTTS_HANDLE_T phTTS,
LPDWORD
                                        pdwRate)

Parameters

LPTTS_HANDLE_T phTTS
                  Specifies a text-to-speech handle and
                                        identifies the opened text-to-speech
                                        device.
LPDWORD pdwRate
                       A pointer to a DWORD that is used to
                                        return the speaking rate. Valid
                                        values range from 75 to 600 words per
                                        minute.

Return Value

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Comments

The current setting of the speaking rate is returned even if the speaking
rate change has not occurred. (The speaking-rate change occurs on clause
boundaries.)

See Also

TextToSpeechSetRate()
---------------------------------------------------------------------------

 TextToSpeechGetSpeaker

This function returns the value of the identifier for the last voice that
has spoken.

Syntax

MMRESULT TextToSpeechGetSpeaker         (LPTTS_HANDLE_T phTTS,
LPSPEAKER_T
                                        lpSpeaker)

Parameters

LPTTS_HANDLE_T phTTS
                  Specifies a text-to-speech Handle
                                        identifying the opened text-to-speech
                                        device.
LPSPEAKER_T lpSpeaker
                  A pointer to a DWORD that returns a
                                        speaker value from the following
                                        list. These symbols are defined in
                                        include file ttsapi.h.

Speaker                                 Description
PAUL                                    Default (male) voice
HARRY                                   Full male voice
FRANK                                   Aged male voice
DENNIS                                  Male voice
BETTY                                   Full female voice
URSULA                                  Aged female voice
WENDY                                   Whispering female voice
RITA                                    Female voice
KIT                                     Child's voice

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

Note that even after a call to the TextToSpeechSetSpeaker() function, this
function returns the value for the previous speaking voice until the new
voice actually speaks.

See Also

TextToSpeechSetSpeaker()
---------------------------------------------------------------------------

 TextToSpeechGetStatus

This function returns the state of one or more text-to-speech system
parameters.

Syntax

MMRESULT TextToSpeechGetStatus          (LPTTS_HANDLE_T phTTS,
DWORD
                                        dwIdentifier[ ],
DWORDdwStatus[ ],

DWORD dwNumberOfStatusValues)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.
DWORD dwIdentifier[ ]                   An array of values of type DWORD that
                                        contains identifiers specifying the
                                        status values to return in array
                                        dwStatus[ ]. These values can be one
                                        of the following constants defined in
                                        include file ttsapi.h:

Constant                                Description
INPUT_CHARACTER_COUNT                   Returns a count of characters in the
                                        text-to-speech system is currently
                                        processing.
STATUS_SPEAKING                         The status value is TRUE if audio
                                        samples are playing and FALSE if no
                                        audio sample is playing.
WAVE_OUT_DEVICE_ID                      The current wave output device ID is
                                        returned.
DWORD dwStatus[ ]
                     An array of type DWORD that contains
                                        the status values corresponding to
                                        each of the identifiers in array
                                        dwIdentifier[].
DWORD dwNumberOfStatusValues            A DWORD that contains the number of
                                        entries to return.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed.
MMSYSERR_ERROR                          Error obtaining status values.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was
                                        invalid.

Comments

The STATUS_SPEAKING status identifier has no meaning if the application is
sending speech to a wave file or sending speech to memory.

---------------------------------------------------------------------------

 TextToSpeechLoadUserDictionary

This function loads a user-defined pronunciation dictionary into the
text-to-speech system.

Syntax

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech Handle
                                        identifying the opened text-to-speech
                                        device.
LPSTR pszFileName                       A pointer to a NULL terminated string
                                        that specifies the name of the user
                                        dictionary file to be loaded.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.
MMSYSERR_NOMEM                          Unable to allocate memory for
                                        dictionary.
MMSYSERR_INVALPARAM                     Dictionary file not found. (Invalid
                                        dictionary file name.)
MMSYSERR_ERROR                          Illegal dictionary format or a
                                        dictionary is already loaded.

Comments

This function loads a dictionary created by the User Dictionary Build Tool
applet. The text-to-speech system loads a default user dictionary at
startup if it finds a file named user.dic in the default directory or in
the directory specified in the directory. Any previously loaded user
dictionary must be unloaded before loading a new user dictionary.

See Also

TextToSpeechUnloadUserDictionary()

Automatic Loading of a User Dictionary
---------------------------------------------------------------------------

 TextToSpeechOpenInMemory

The TextToSpeechOpenInMemory() function allows speech to be stored in
memory buffers supplied by the application. These buffers are passed to the
text-to-speech system using the TextToSpeechAddBuffer() function.

Syntax

MMRESULT TextToSpeechOpenInMemory       (LPTTS_HANDLE_T phTTS,
DWORD
                                        dwFormat)

Parameters

LPTTS_HANDLE_T phTTS                    A pointer to a text-to-speech handle.
DWORD dwFormat                          An identifier that determines the
                                        audio sample format. It is one of the
                                        following constants defined in the
                                        include files mmsystem.h and
                                        ttsapi.h.

Constant                                Description
WAVE_FORMAT_11M08                        Mono, 8-bit 11.025 kHz sample rate
WAVE_FORMAT_11M16                        Mono, 16-bit 11.025 kHz sample rate
WAVE_FORMAT_08M08                       Mono, 8-bit -law, 8 kHz sample rate

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed. (An
                                        illegal output format value.)
MMSYSERR_NOMEM                          Unable to allocate memory.
Constant                                Description
MMSYSERR_ERROR                          Illegal output state.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comment

The buffer is passed using the structure TTS_BUFFER_T. The user must
allocate the structure and the memory buffer. The text-to-speech system
returns the buffer to the application when the buffer is full. The
TextToSpeechStartup() function must be called to start the text-to-speech
system before calling this function.

The buffer is sent in a message to the window procedure that corresponds to
the window handle passed to the function TextToSpeechStartup(). A pointer
to the structure of the type TTS_BUFFER_T is in the LPARAM field of the
message. The message ID value can be obtained by the following call:

uiID_Buffer_Message = RegisterWindowMessage("DECtalkBufferMessage");

See the section, Storing Speech Samples in Memory , at the beginning of
this Appendix for more information. The TextToSpeechStartup() function must
be called to start the text-to-speech system before calling this function.

See Also

TextToSpeechAddBuffer()

TextToSpeechCloseInMemory()

TextToSpeechReturnBuffer()

Special text-to-speech Modes

Storing Speech Samples in Memory
---------------------------------------------------------------------------

 TextToSpeechOpenLogFile

This function creates a file that contains text, phonemes, or syllables.
The phonemes and syllables are written using the arpabet alphabet. After
calling this function, all subsequent calls to the TextToSpeechSpeak()
function cause the log data to be written to a specified file until the
TextToSpeechCloseLogFile() function is called.

Syntax

MRESULT TextToSpeechOpenLogFile         (LPTTS_HANDLE_T phTTS,
LPSTR
                                        pszFileName,
DWORD dwFlags)

Parameters

LPTTS_HANDLE_T phTTS                    A pointer to a text-to-speech handle.
char pszFileName                        A pointer to a NULL terminated string
                                        that specifies the name of the log
                                        file to be opened.
DWORD dwFlags                           Specifies the type of output. It can
                                        contain one or more of the following
                                        constants:
Constants                               Description
LOG_TEXT                                Log text
LOG_PHONEMES                            Log phonemes
LOG_SYLLABLES                           Log syllable structure

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants.

Constants                               Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed.

Continued on next page

Continued from previous page

MMSYSERR_NOMEM                          Unable to allocate memory.
MMSYSERR_ALLOCATED                      A phoneme file is already open.
MMSYSERR_ERROR                          Unable to open the output file.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

If more than one of the flags are passed, then the logged output is mixed
in an unpredictable fashion. If there is already a log file open, this
function returns an error. The voice-control Log command [:Log] has no
effect when a log file is already open. The TextToSpeechStartup() function
must be called to start the text-to-speech system before calling this
function.

See Also

TextToSpeechCloseLogFile()

Creating a Log File

Special text-to-speech Modes
---------------------------------------------------------------------------

 TextToSpeechOpenWaveOutFile

This function opens the named file for speech output as a wave file.

Syntax

MMRESULT TextToSpeechOpenWaveOutFile    (LPTTS_HANDLE_T phTTS,
LPSTR
                                        pszFileName,
DWORD dwFormat)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle and
                                        identifies the opened text-to-speech
                                        device.
PSZFileName FileName                    Specifies a pointer to a file name.
DWORD dwFormat                          Determines the audio sample format.
                                        It can be one of the following
                                        constants that are defined in include
                                        files mmsystem.h and ttsapi.h:

Constant                                Description
WAVE_FORMAT_11M08                        Mono 8-bit, 11.025 kHz sample rate
WAVE_FORMAT_11M16                        Mono 16-bit, 11.025 kHz sample rate
WAVE_FORMAT_08M08                       Mono 8-bit, -law 8 kHz sample rate

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was
                                        passed.
Illegal wave output format
MMSYSERR_NOMEM                          Memory allocation error.
MMSYSERR_ALLOCATED                      A wave file is already open.
MMSYSERR_ERROR                          Unable to open the wave file. Unable
                                        to write to the wave file.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

If an application calls the TextToSpeechOpenWaveOutFile() function, all
subsequent calls to the TextToSpeechSpeak() function write the audio to a
wave file until the TextToSpeechCloseWaveOutFile() function is called. The
TextToSpeechStartup() function must be called to start the text-to-speech
system before calling this function.

See Also

TextToSpeechOpenWaveOutFile

Creating a Wave File

Special text-to-speech Modes
---------------------------------------------------------------------------

 TextToSpeechPause

This function pauses text-to-speech audio output.

Syntax

MMRESULT TextToSpeechPause (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T phTTS
                  Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The specified device handle is
                                        invalid. The system is not speaking
                                        or the text-to-speech handle is
                                        invalid.

Comments

This function only affects the audio output and will have no effect when
writing log files, wave files, or when using the speech-to-memory
capability of the text-to-speech system.

The text-to-speech system will remain paused until one of the following
functions is called:

*  TextToSpeechResume()

*  TextToSpeechSync()

*  TextToSpeechOpenInMemory()

*  TextToSpeechOpenLogFile()

TextToSpeechOpenWaveOutFile()

If the wave output (audio) device is being shared (i.e. OWN_AUDIO_DEVICE
was NOT specified when the TextToSpeechStartup() function started the
text-to-speech system.) by the text-to-speech system, and the
TextToSpeechPause() function is called while the system is speaking, the
wave output device is not released until one of the functions listed above
is called and the system finishes speaking or the TextToSpeechReset()
function is called. Note that the TextToSpeechReset() function will NOT
resume audio output if text-to-speech system has been paused by the
TextToSpeechPause() function.

See Also

TextToSpeechResume()

Audio Output Control Functions
---------------------------------------------------------------------------

 TextToSpeechReset

This function flushes all previously queued text from the text-to-speech
system and stops any audio output.

Syntax

MMRESULT TextToSpeechReset               (LPTTS_HANDLE_T phTTS,
BOOL bReset)

Parameters

LPTTS_HANDLE_T phTTS
                  Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.
BOOL bReset                             bReset returns one of the following
                                        Boolean values:

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

 Constant                               Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_NOMEM                          Unable to allocate memory.
MMSYSERR_ERROR                          Unable to flush the system.
Value                                   Description
FALSE                                   Preserves the current mode of the
                                        text-to-speech system.
TRUE                                    The text-to-speech system is returned
                                        to the startup state and any open
                                        text-to-speech files are closed. The
                                        one exception is that this function
                                        will NOT resume the text-to-speech
                                        system if it has been paused by the
                                        TextToSpeechPause() function.
 MMSYSERR_INVALHANDLE                   The text-to-speech handle was invalid.

Comments

The file is closed if the application has called the
TextToSpeechOpenWaveOutFile() function or the TextToSpeechOpenLogFile()
function and if bReset has a value of TRUE. Then, the TextToSpeechReset()
function flushes all previously queued text and stops all audio output. If
the TextToSpeechOpenInMemory() function has enabled outputting the speech
samples to memory, then all queued TTS_BUFFER_T structures are returned to
the application by a message that is sent to the application's window
procedure. See the TextToSpeechOpenInMemory() function for more
information.

See Also

TextToSpeechStartup()

TextToSpeechShutdown()

Audio Output Control Functions
---------------------------------------------------------------------------

 TextToSpeechResume

This function resumes text-to-speech output after it has been paused by
calling the TextToSpeechPause() function.

Syntax

MMRESULT TextToSpeechResume (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T phTTS
                  Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.

Return Value

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The system was not paused, or the
                                        text-to-speech handle was invalid.

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Comments

This function only affects the audio output and has no effect when writing
log files, wave files, or when using the speech-to-memory capability of the
text-to-speech system.

See Also

TextToSpeechPause

Audio Output Control Functions
---------------------------------------------------------------------------

 TextToSpeechReturnBuffer

This function returns the current buffer when an application is using the
text-to-speech system's speech-to-memory capability. The buffer can be
empty or partially full when it is returned. The dwBufferLength element of
the TTS_BUFFER_T structure contains the number of samples in the buffer. If
no buffer is available, then a NULL pointer is returned in ppTTSBuffer.

Syntax

MMRESULT TextToSpeechReturnBuffer       (LPTTS_HANDLE_T
                                        phTTS,
LPTTS_BUFFER_TppTTSbuffer)

Parameters

LPTTS_HANDLE_T phTTS
                  A pointer to a structure of type
                                        TTS_HANDLE_T.
LPTTS_BUFFER_T *ppTTSbuffer             The address of a pointer to a
                                        structure of type TTS_BUFFER_T.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     Invalid parameter.
MMSYSERR_ERROR                          Output to memory not enabled or
                                        unable to create a system object.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

Most applications do not require this function because buffers are
automatically returned when filled or when a TTS_FORCE flag is passed in
the TextToSpeechSpeak() function. The TextToSpeechReturnBuffer() function
is provided so an application can return a buffer before it is filled and,
therefore, obtain more speech samples immediately.

See Also

TextToSpeechStartup()

TextToSpeechShutdown()
---------------------------------------------------------------------------

 TextToSpeechSetLanguage

This function selects a language for the text-to-speech system to use as
the default language.

Syntax

MMRESULT TextToSpeechSetLanguage        (LPTTS_HANDLE_T phTTS,
LANGUAGE_T
                                        Language)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.
LANGUAGE_T Language                     Specifies a language. It must be one
                                        of languages listed below. (Currently
                                        there is only one supported language.)
Constant                                Description
TTS_AMERICAN_ENGLISH                    Specifies American English. This
                                        symbol is defined in include file
                                        ttsapi.h

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

Currently, American English is the only supported language.
---------------------------------------------------------------------------



See Also

TextToSpeechGetLanguage()
---------------------------------------------------------------------------

 TextToSpeechSetRate

This function sets the text-to-speech speaking rate.

Syntax

MMRESULT TextToSpeechSetRate            (LPTTS_HANDLE_T phTTS,
DWORD dwRate)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.
DWORD dwRate                            Sets the speaking rate. Valid values
                                        range from 75 to 600 words per
                                        minute.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

The speaking rate change is not effective until the next phrase boundary.
All the queued audio encountered before the phrase boundary is unaffected.

See Also

TextToSpeechGetRate()
---------------------------------------------------------------------------

 TextToSpeechSetSpeaker

This function sets the voice of the speaker the text-to-speech system will
use.

Syntax

MMRESULT TextToSpeechSetSpeaker         (LPTTS_HANDLE_T phTTS, SPEAKER_T
                                        Speaker)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device
SPEAKER_T  Speaker                      Selects a speaker from the following
                                        list. These values are defined in
                                        include file ttsapi.h.

Speaker                                 Description
PAUL                                    Default (male) voice
HARRY                                   Full male voice
FRANK                                   Aged male voice
DENNIS                                  Male voice
BETTY                                   Full female voice
URSULA                                  Aged female voice
WENDY                                   Whispering female voice
RITA                                    Female voice
KIT                                     Child's voice

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALPARAM                     An invalid parameter was passed.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

The change in speaking voice is not effective until the next phrase
boundary. All queued audio encountered before the phrase boundary is
unaffected.

See Also

TextToSpeechGetSpeaker()
---------------------------------------------------------------------------

 TextToSpeechShutdown

This function shuts down the text-to-speech system and frees all system
resources used by the text-to-speech system.

Syntax

MMRESULT TextToSpeechShutdown (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

This function is called when you close an application. Any user-defined
dictionaries, which were previously loaded, are automatically unloaded. All
previously queued text is discarded and the text-to-speech system will
immediately stop speaking.

See Also

TextToSpeechStartup()
---------------------------------------------------------------------------

 TextToSpeechSpeak

This function queues a null-terminated string to the text-to-speech system.

Syntax

MMRESULT TextToSpeechSpeak              (LPTTS_HANDLE_T phTTS,
LPSTR
                                        pszTextString,
DWORD dwFlags)

Parameters

LPTTS_HANDLE_T phTTS
                   Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.
LPSTR pszTextString
                   Specifies a pointer to a null
                                        terminated string of characters to be
                                        queued
DWORD dwFlags
                         Specifies whether the text is to be
                                        pushed through the text-to-speech
                                        system even if it does NOT end on a
                                        clause boundary. It can be set to one
                                        of the following constants defined in
                                        include file ttsapi.h:

Constant                                Description
TTS_NORMAL                              Insert characters in the
                                        text-to-speech queue.
TTS_FORCE                               Insert characters in the
                                        text-to-speech  queue and force all
                                        text to be output even if the text
                                        stream does NOT end on a clause
                                        boundary.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR _NOMEM                         Unable to allocate memory.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

The speaker, speaking rate, and volume can also be changed in the text
string by inserting voice-control commands as shown in the following
example:

[:name paul] I am Paul. [:nb] I am Betty. [:volume set 50] The volume has
been set to 50% of the maximum level. [:ra 120] I am speaking at 120 words
per minute.

See Also

About TextToSpeechSpeak()

---------------------------------------------------------------------------

 TextToSpeechStartup

This function initializes the text-to-speech system and returns a value of
type MMRESULT. This value is zero if initialization was successful. A
single process can run only one instance of DECtalk.

Syntax

MMRESULT TextToSpeechStartup            (HWND hWnd,
LPTTS_HANDLE_T *phTTS,

UINT uiDeviceNumber,
DWORD
                                        dwDeviceOptions)
VOID
                                        (*DTCallbackRoutine) (),
Long
                                        dwDTCallbackParameter

Parameters

HWND hWnd                               A handle to the parent window. This
                                        can be NULL.
LPTTS_HANDLE_T *phTTS                   A pointer to a pointer to a structure
                                        of type TTS_HANDLE_T.
UINT uiDeviceNumber
                   Specifies a device number of the wave
                                        output device. A value of WAVE_MAPPER
                                        can be used to select the first
                                        available device.
DWORD dwDeviceOptions
                  Specifies how the wave output device
                                        is managed. It can be a combination
                                        of the following constants defined in
                                        include file ttsapi.h:

Constants                               Description
OWN_AUDIO_DEVICE                        The wave output device is opened. No
                                        other process can allocate the wave
                                        output device until the
                                        TextToSpeechShutdown() function is
                                        called.
.                                       If OWN_AUDIO_DEVICE is NOT specified,
                                        the wave output device is opened
                                        after audio is queued by the
                                        TextToSpeechSpeak() function. The
                                        wave output device is released when
                                        the text-to-speech system has
                                        completed speaking.
DO_NOT_USE_AUDIO_DEVICE                 The text-to-speech system can only be
                                        used to write wave files, write
                                        speech samples to memory, or to write
                                        log files. No error is returned if a
                                        wave output device is not present.
OUTPUT_TO_MME_DEVICE                    This flag need not be specified
                                        anymore, it is still available for
                                        compatiblity with previous versions
                                        of DECtalk Software.

VOID (*DtCallbackRoutine)()
This parameter is used to specify a callback routine. The callback routine
is used by DECtalk Sofware to inform the application when the buffer is
full (if DECtalk Software in-memory calls are used) or when the
TexToSpeechSpeak () function encounters an index mark.

A value of NULL should be passed in if no user-specificed parameters are
desired.

LONG dwCallbackParameter
This is a pointer to a user-specified parameter. It is used to pass
parameters into the callback routine.

A value of NULL should be passed in if no user-specified parameters are
desired.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constant                                Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_NODRIVER                       No wave output device present.
MMSYSERR_NOMEM                          Memory allocation error.
MMSYSERR_ERROR                          DECtalk dictionary not found.
MMSYSERR_baddevice_id                   Device ID out of range.

Comments

The default parameters are:

Language: American English.
Speaking rate: 180 words per
minute. Speaker: Paul.

See Also

TextToSpeechShutdown()

---------------------------------------------------------------------------

 Loading of the Main Pronunciation Dictionary

The TextToSpeechStartup() function loads the DECtalk main pronunciation
dictionary, dectalk.dic, from the directory specified in the directory at
/usr/lib/dtk/.

If the dictionary file cannot be found in this fashion then the
TextToSpeechStartup() function returns a value of MMSYSERR_ERROR.
---------------------------------------------------------------------------

 Loading of the User Dictionary

The TextToSpeechStartup() function attempts to load a user specified
pronunciation dictionary from the user's login home directory. When
started, DECtalk Software loads the default user dictionary called user.dic
if it is available.

If the dictionary file cannot be found in this fashion then the
TextToSpeechStartup() function attempts to load the user dictionary from
the applications default directory. If this second attempt fails then a
user dictionary is not loaded.

See Also

TextToSpeechLoadUserDictionary()

TextToSpeechUnloadUserDictionary()
---------------------------------------------------------------------------

 TextToSpeechSync

This function blocks until all previously queued text has been processed.
This function automatically resumes audio output if the text-to-speech
system has been paused by the TextToSpeechPause() function.

Syntax

MMRESULT TextToSpeechSync (LPTTS_HANDLE_T phTTS)

Parameters

LPTTS_HANDLE_T ph TTS                   Specifies a text-to-speech handle
                                        identifying the opened text-to-speech
                                        device.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constants                               Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_ERROR                          Unable to complete queued text.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

This function automatically resumes audio output if the text-to-speech
system is in a paused state by a previously issued TextToSpeechPause()
function.
---------------------------------------------------------------------------

 TextToSpeechUnloadUserDictionary

This function unloads a user dictionary. You must unload any previously
loaded dictionary before you can load a new one. That is, only one user
dictionary can be loaded at a time.

Syntax

MMRESULT                                (LPTTS_HANDLE_T phTTS)
TextToSpeechUnloadUserDictionary

Parameters

LPTTS_HANDLE_T phTTS                    Specifies a text-to-speech Handle
                                        identifying the opened text-to-speech
                                        device.

Return Value

This function returns a value of type MMRESULT. The value is zero if the
function is successful. The return value is one of the following constants:

Constants                               Description
MMSYSERR_NOERROR                        Normal successful completion.
MMSYSERR_INVALHANDLE                    The text-to-speech handle was invalid.

Comments

A user dictionary is created using the User Dictionary Build tool. See
Creating a user dictionary.

See Also

TextToSpeechLoadUserDictionary()

